Skip to main content

Nvidia Release Massive AI-Ready Open European Language Dataset and Tools

2 months 3 weeks ago
"Only a tiny fraction of the more than 7,000 languages on Earth are supported by artificial intelligence models," reported SiliconANGLE this week. So Nvidia announced "a massive new AI-ready dataset and models to support the development of high-quality AI translation for European languages." The new dataset, named Granary, is a massive open-source corpus of multilingual audio, including more than a million hours of audio, plus 650,000 hours of speech recognition and 350,000 hours of speech translation. Nvidia's speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler to process unlabeled audio and public speech data into information usable for AI training... Granary includes 25 European languages, representing nearly all of the European Union's 24 official languages, plus Russian and Ukrainian. The dataset also contains languages with limited available data, such as Croatian, Estonian and Maltese. This is critically important because providing these underrepresented human-annotated datasets will enable developers to create more inclusive speech technologies for audiences who speak those languages, while using less training data in their AI applications and models... The team demonstrated in their research paper that, compared to other popular datasets, it takes around half as much Granary training data to achieve high accuracy for automatic speech recognition and automatic speech translation. Alongside Granary, Nvidia also released new Canary and Parakeet models to demonstrate what can be created with the dataset... The new Canary is available under a fairly permissive license for commercial and research use, expanding Canary's current languages from four to 25. It offers transcription and translation quality comparable to models three times larger while running inference up to 10 times faster. At 1 billion parameters, it can run completely on-device on most next-gen flagship smartphones for speech translation on the fly.

Read more of this story at Slashdot.

EditorDavid

James Cameron Struggles With Real-World Horrors for 'Terminator 7' and New Hiroshima Movie

2 months 3 weeks ago
"James Cameron has a confession: he can't write Terminator 7..." according to the Guardian, "because reality keeps nicking his plotlines." "I'm at a point right now where I have a hard time writing science-fiction," Cameron told CNN this week. "I'm tasked with writing a new Terminator story [but] I don't know what to say that won't be overtaken by real events. We are living in a science-fiction age right now...." What Cameron should be looking for is a complete system reboot to reinvigorate the saga in the way Prey brought fans back to Predator and Alien: Romulus restored interest in slimy Xenomorphs. All evidence suggests that the 70-year-old film-maker is far more interested in the current challenges surrounding AI, superintelligences and humankind's constant efforts to destroy itself, which doesn't exactly lend itself to the sort of back-to-basics, relentless-monsters-hunt-a-few-unlucky-humans-for-two-hours approach that has worked elsewhere. The challenge here seems to be to fuse Terminator's core DNA — unstoppable cyborgs, explosive chase sequences, and Sarah Connor-level defiance — with the occasionally rather more prosaic yet equally scary existential anxieties of 21st-century AI doom-mongering. So we may get Terminator 7: Kill List, in which a single, battered freedom fighter is hunted across a decimated city by a T-800 running a predictive policing algorithm that knows her next move before she does. Or T7: Singularity's Mom, in which a lone Sarah Connor-type must protect a teenage coder whose chatbot will one day evolve into Skynet. Or Terminator 7: Terms and Conditions, in which humanity's downfall comes not from nuclear warfare but from everyone absent-mindedly agreeing to Skynet's new privacy policy, triggering an army of leather-clad enforcers to collect on the fine print. Or perhaps the future just looks terrifying enough without Cameron getting involved — which, rather worryingly for the future of the franchise, seems to be the director's essential point. "The only way out is through," Cameron said in the CNN interview, "by using our intelligence, by using our curiosity, by using our command of technology, but also, by really understanding the stark probabilities that we face." In the meantime, Cameron is working on a new film inspired by the book Ghosts of Hiroshima, a book written by Charles Pellegrino, one of the consultants on Titanic. "I know what a meticulous researcher he is," Cameron told CNN in a recent interview. (Transcript here.) CAMERON: He's talked about this book for ages and ages and sent me early versions of it. So, I've read it with interest, great interest a number of times now. What compels me out of all that and what I think the human hook for understanding this tragedy is, is to follow a handful, specifically two will be featured of survivors, that actually survived not only the Hiroshima blast, but then went to Nagasaki and three days later were hit again.... This film scares me. I fear making this film. I fear the images that I'm going to have to create, to be honest and to be truthful. CNN also spoke to former U.S. Energy secretary Ernest Moni, who is now a CEO at the nonprofit global security organization, the Nuclear Threat Initiative: MONI: There remains a false narrative that the possession of these nuclear weapons is actually making us safer when they're not. That's the narrative I think, ultimately, we need to change. Harry Truman said, quite correctly, these nuclear weapons, they are not military weapons. Dropped on a city, they indiscriminately kill combatants, non-combatants, women, children, etc. They should not be thought of as military weapons, but as weapons of mass destruction, indiscriminate mass destruction when certainly dropped in an urban center. Thanks to long-time Slashdot reader schwit1 for sharing the article.

Read more of this story at Slashdot.

EditorDavid

Threads Has 400 Million Monthly Users. But Who Are They?

2 months 3 weeks ago
Threads now has more than 400 million monthly active users. But who are these people who are actually using Threads, asks Mashable? And what is their cultural footprint? Threads is the Big Bang Theory of social media. Bland, boring, largely unoffensive, and somehow, it was the most popular show on television for years... At any given time, "Twitter" and "X" are searched somewhere between 12 and 30 times more than "Threads" on Google, according to the search engine's Trends data. Threads is a popular platform without much of an identity... [Threads] is consistently good at one thing users really want from a social media platform: for their posts to be seen and engaged with. Threads might be boring in comparison to its competitors, but its users say it might be the only place on the internet right now where they don't feel they are screaming into the void.... Much like TikTok, you don't actually have to have thousands of followers to find decent engagement on the app. One user, commenting in a Reddit forum questioning who actually uses the app, said they "find it worthwhile" because "you can just say stuff on there under a tag and people will find it and respond...." According to consumer research company GWI, while users signed up for Threads because of its integration with Instagram, they're staying because Threads users are "community-focused," noting there's a strong overlap between Discord users and Threads users.... It just doesn't have the same flair as X or Twitter, which could be because Adam Mosseri, the head of Instagram, went out of his way to ensure politics was downplayed when Threads first launched. (Meta has since backtracked slightly by phasing "civic content" back into Threads "with a more personalized approach....") Threads is still in its adolescence. It lacks the media ecosystem that made Twitter indispensable for journalists, politicians, and celebrities. But it has something else: sheer scale and Meta's backing. With Instagram's 2 billion users as a feeder system, Meta can keep funneling people toward Threads whether they like it or not. The article also points out Threads is integrated with the fediverse, supporting ActivityPub's decentralized protocol...

Read more of this story at Slashdot.

EditorDavid

FSF Announces Photo Contest Honoring 40 Years of Free Software

2 months 3 weeks ago
The Free Software Foundation announced a special photography contest honoring its 40th anniversary: The technology we use every day has changed dramatically since our founding nearly forty years ago, including the way we interact with it... We're incredibly grateful for the countless hours that developers and users have put into the free software programs that exist today. Without all the people who cared enough to make and use software that respects the four freedoms four decades or even a year ago, we wouldn't have much to celebrate. We want to honor the hard work that has gone into free software and its development with the FSF40 Photo Contest. Starting on August 14, 2025, we're inviting free software supporters worldwide to share how they use free software on a daily basis. While we can think of hundreds of ways that free software can be used, there's almost certainly many of you who have thought of much more creative ways to involve libre software every day! Shortly after the photo contest closes on August 31, 2025, we will invite you and other free software supporters to vote for your favorite of the #FSF40Photos... We will be displaying the winning photos at our fortieth [anniversary] celebration in Boston, MA on October 4, 2025 — we hope you get to see them on a big screen with us! Earlier this month the FSF also shared 40 links from around the FSF and GNU sites "that give a sense of what we've been doing all this time as we work for your freedom." (For example, 2007's announcement of the GNU General Public License, version 3.)

Read more of this story at Slashdot.

EditorDavid