pwshub.com

Cohere announces Aya Expanse multilingual AI model family for researchers

Cohere for AI, the nonprofit research lab run by the artificial intelligence startup Cohere Inc., pushed the boundaries of multilingual frontier AI model research today with the release of Aya Expanse, a family of high-performance multilingual large language models that it says outperform other leading open rivals.

The new family includes two new models in 8 billion and 32 billion parameters released with open weights on hosting sites Kaggle and Hugging Face. The models cover 23 languages including English, Arabic, Chinese, Czech, Dutch, French, German, Greek and Hindu.

“Aya Expanse marks an important step to expand high-quality coverage of languages in LLMs,” said the Cohere research team. “Since we first launched the Aya initiative two years ago, we have collaborated with over 3,000 researchers from 119 countries to expand cutting-edge multilingual research.”

The Aya Initiative is a goal by Cohere to advance state-of-the-art multilingual AI to bridge the gap between people across the world using technology and expand the number of languages covered by AI. It involves building the Aya collection, the largest multilingual dataset collection to date, which includes 513 million examples, and Aya-101, an AI model capable of covering more than 100 languages.

The team said that it used several new core research innovations in Aya Expanse that gave it superior performance. These included the use of synthetic data, human feedback in late-term training and model merging.

To train Aya Expanse, the company said the lab turned to synthetic data for languages with limited data sets. This is not an uncommon practice in the AI industry, using data generated by “teacher” models for training.

However, there is a problem where large language models can suffer from model collapse or produce “gibberish” when trained on synthetic data. To avoid this, the company used data arbitrage, where it used teacher models that had specialized skills in particular multilingual language skills.

Near the late stage of model training, the company said, it began using feedback from human teachers to guide the model toward high-quality outputs. Many multilingual models tend to be biased toward Western cultures and settings, mostly thanks to the countries of origin of their datasets and the companies that build them.

“Our work is one of the first that extends preference training to a massively multilingual setting, accounting for different cultural and linguistic perspectives,” the company said. “We find this leads to large gains both in general performance and safety.”

Finally, to increase performance, Cohere combines the model weights of multiple fine-tuned candidates at each stage in an attempt to create a single model. According to a study written on the subject, merging can sometimes bring improvements up to 8% and 10% in general performance and safety respectively.

The company said these innovations brought Aya Expanse 8B to achieve a 60.4% simulated win rate in multilingual performance against Google LLC’s Gemma 2 9B LLM in m-ArenaHard benchmarks. The larger model, Aya Expanse 32B, outperforms Gemma 2 72B and Mistral 8x22B at 51.8% and 76.6%, respectively. It also outperformed Meta Platforms Inc.’s Llama-3.1 70B, a model twice its size, in pair-wise win rates at 54%.

In addition to releasing the open weights for Aya Expanse 8B and 32B, Cohere said the company is continuing to collaborate on wider multilingual AI research to broaden access to linguistic data, software and compute resources.

Source: siliconangle.com

Related stories
1 month ago - The White House today announced voluntary commitments from several leading artificial intelligence companies to rein in the creation and distribution of image-based sexual abuse, IBSA, including “deepfake” content generated by AI. The...
3 weeks ago - Voyage AI Inc., a startup with software for organizing the data processed by artificial intelligence models, today announced that it has raised $20 million in funding. CRV led the Series A investment. It was joined by Snowflake Inc.,...
1 month ago - Two more tech giants may join the new funding round that OpenAI is rumored to be raising. Citing sources familiar with the matter, Bloomberg reported today that Nvidia Corp. and Apple Inc. may participate in the investment. OpenAI is...
1 month ago - Some of the biggest names in artificial intelligence (AI) are choosing Oracle's data centers, including OpenAI and Elon Musk's xAI.
1 month ago - Slack is updating its collaboration platform with a set of artificial intelligence features designed to save time for knowledge workers. The Salesforce.com Inc. unit detailed the enhancements today ahead of its parent company’s Dreamforce...
Other stories
42 minutes ago - Intelligent automation is quickly becoming a game-changer for businesses looking to stay competitive in an increasingly digital landscape. From streamlining repetitive tasks to enhancing accuracy across complex processes, this technology...
42 minutes ago - Anthropic PBC, the creator of the generative artificial intelligence chatbot Claude, today introduced an analysis tool that allows Claude to write and run JavaScript code. Using the new tool, which is currently in preview, Claude can...
42 minutes ago - Find the latest KKR & Co. Inc. (KKR) stock forecast based on top analyst's estimates, plus more investing and trading data from Yahoo Finance
42 minutes ago - Silver prices have experienced a significant increase, rising over 6% to exceed $33.6 per ounce. This unexpected surge has put five U.S. banks at risk of substantial financial losses due to their large short positions in the metal. What...
42 minutes ago - In a recent episode of the Women & Money podcast, a listener named PBS asked Suze Orman a question many people face later in life: should she consider getting a prenuptial agreement before tying the knot with her partner? PBS is a single...