pwshub.com

Meta’s Spirit LM generates more expressive voices that reflect anger, surprise, happiness and other emotions

Meta Platforms Inc.’s Fundamental AI Research team is going head-to-head with OpenAI yet again, unveiling a new open-source multimodal large language model called Spirit LM that can handle both text and speech as inputs and outputs.

These are the same capabilities that distinguish OpenAI’s most powerful LLM, GPT-4o, as well as other multimodal models such as Hume AI Inc.’s EVI 2. Meta’s artificial intelligence research team announced Spirit LM late Friday, saying it’s designed to address some of the challenges around existing AI voice systems, which often sound somewhat robotic and emotionless.

The problem with traditional AI models is that they’re unable to replicate the expressive qualities of human voices, such as tone and emotion. That’s because they rely on automatic speech recognition systems to process spoken inputs, before synthesizing them with a language model and converting it all using text-to-speech models.

Meta Spirit LM has an entirely different design featuring tokens for phonetics, pitch and tones, in order to add those expressive qualities to its speech outputs. At the same time, it’s capable of learning new tasks across a range of modalities, including automatic speech recognition, text-to-speech and speech classification.

What that means is that it can learn and improve the way it converts spoken language into text, generates spoken language from text, and identifies and categorizes speech based on its content or emotional tone.

Two flavors available

Meta said it’s making two versions of Meta Spirit LM available to the research community under its FAIR Noncommercial Research License, which allows anyone to use, reproduce, modify and create derivative works for noncommercial purposes. Any distribution of these models or derivatives must also comply with the noncommercial restriction.

The models include Spirit LM Base, which uses phonetic tokens to process and generate speech, and Spirit LM Expressive, which is a more advanced version that includes tokens for pitch and tone. These allow it to understand and reproduce more nuanced emotions in voices, such as excitement and sadness, and reflect them in its own speech.

The models were trained on a wide range of information, including both text and speech datasets, allowing it to handle cross-modal tasks such as text-to-speech and speech-to-text with humanlike natural expressiveness in its outputs, Meta’s researchers said.

According to the researchers, the Spirit LM Expressive model can also detect and reproduce emotional states such as anger, surprise and happiness in its speech outputs. They believe this will have huge implications for AI assistants such as customer service bots, where the ability to engage in more nuanced conversations can help to improve customer satisfaction.

Along with the two models, Meta is making all of the model weights, code and supporting documentation available to the research community, encouraging them to build and experiment with them further. The hope is that this will inspire other researchers to explore new ways for integrating speech and text in multimodal AI systems.

In addition to Meta Spirit LM, Meta’s research team also announced an update to the Segment Anything model for image and video segmentation tasks that was revealed last year. It’s designed to power applications such as medical imaging and meteorology.

The company also published its latest research on boosting the efficiency of LLMs, as part of its broader goal to create advanced machine intelligence, or AMI.

Source: siliconangle.com

Related stories
1 month ago - LVMH CEO Bernard Arnault was wealthier than Elon Musk and Jeff Bezos in March, but his net worth has plummeted since then.
3 weeks ago - The LVMH founder's falling net worth is a sign of a struggling luxury goods industry.
2 weeks ago - The European Union’s top court today ruled against Meta Platforms Inc. in a case that focused on the way the company processes user data. The decision could also have implications for other players in the social media market. The lawsuit...
1 week ago - Meta Platforms, Inc. (NASDAQ:META) and Palantir Technologies, Inc. (NYSE:PLTR) are among the artificial intelligence stocks that have seen strong gains since early to mid-September. While the former is perched at a record high, the...
2 weeks ago - The recent Meta Connect 2024 eThe recent Meta Connect 2024 event, which highlighted a mix of new products and technology advancements from Meta Platforms (NASDAQ:META), may prove to be justification that the company's bets on the...
Other stories
20 minutes ago - Cryptocurrency bitcoin hit a three-month high in early Asia trading on Monday and the dollar looked set to extend its gains in markets counting down to the U.S. presidential election in two weeks. Election polls show rising odds of...
20 minutes ago - (Bloomberg) -- This year’s furious rally in US stocks is poised to extend into the final stretch of 2024, even as the US presidential election looms as a major wild card, according to the latest Bloomberg Markets Live Pulse survey.Most...
20 minutes ago - Scammers know that the Halloween to holiday shopping season is a prime time to target online shoppers, including using fake 'card declined' messages.
1 hour ago - SHANGHAI/SINGAPORE (Reuters) -China cut benchmark lending rates as anticipated at the monthly fixing on Monday, following reductions to other policy rates last month as part of a package of stimulus measures to revive the economy. The...
1 hour ago - Perplexity AI Inc., creator of an artificial intelligence-powered search engine platform, is reportedly discussing yet another huge funding round with investors that could increase its valuation to more than $8 billion. In the last year,...