pwshub.com

Alibaba announces Qwen2-VL AI model with advanced video analysis and reasoning capabilities

Alibaba Cloud, the cloud computing arm of China’s Alibaba Group Ltd., announced Thursday the release of a new artificial intelligence model named Qwen2-VL capable of advanced vision comprehension and multilingual conversational capabilities.

The company, which has been working on the new model for a year to produce the new model based on the Qwen-VL AI model, said it can achieve understanding of high-quality videos of more than 20 minutes in length.

According to Alibaba, it can summarize video content, answer questions related to it, and maintain a continuous flow of conversation in real-time, as well as live chat support. As a result, it can act as a personal assistant, using information drawn directly from video content.

In an example, the model was given a video of what appeared to be a short documentary clip for the International Space Station, including a scene of the control center and a shot of two astronauts speaking from within a capsule while floating in space.

It’s not perfect. When asked to summarize the scene the model responded with a clear output including descriptions of the individuals speaking, the control room and “the men appear to be astronauts, and they are wearing space suits.” The astronauts were not wearing space suits; they appeared to be wearing collared shirts and pants.

When asked what color the clothing the astronauts were wearing the model correctly answers: “The two astronauts are wearing blue and black clothes.” One man is indeed wearing a blue shirt and the other is wearing a black shirt.

The model is capable of providing a foundation for text conversational real-time live chat, where users can talk with the model and it can answer questions about a video. It is also capable of function calling and tool use based on vision, enabling it to retrieve and access external data, such as flight statuses, weather forecasts and package tracking. That would make it useful for interacting with customer service or workers in the field who could show it images of products, bar codes or other information.

Alibaba said a key improvement of the model from Qwen-VL is the continued use of the Vision Transformer model, or ViT, and the Qwen2 language model. The company said it used a ViT with about 600 million parameters to handle both image and video inputs at the same time.

The model was enhanced with the implementation of Native Dynamic Resolution support, which allows the model to handle an arbitrary number of image resolutions, an upgrade over its predecessor. And the addition of Multimodal Rotary Position Embedding system, or M-ROPE, further enables models to understand textual, 2D visual and 3D positional data at the same time.

Qwen2-VL is available in open source in two sizes under the highly permissive Apache 2.0 license with Qwen2-VL-2B and Qwen2-VL-7B. The company also released a demo running the 7 billion-parameter model on Hugging Face.

The model does have its limitations, the company noted, as it is unable to extract audio from video files, given that it’s designed only for visual reasoning. Its training is also only up to date as of June 2023 and it cannot guarantee complete accuracy for complex instructions or scenarios. However, Alibaba said that the model’s performance and visual capabilities showcased top-tier benchmarks across most metrics, even surpassing closed-sourced models such as OpenAI’s flagship GPT-4o and Anthropic PBC’s Claude 3.5-Sonnet.

The company said the Qwen2-VL family will be a stepping stone toward stronger vision language models. They will integrate more features on the path toward an “omni” model that will be able to reason across both vision and audio.

Source: siliconangle.com

Related stories
2 weeks ago - All eyes were on Nvidia’s earnings report this week as a proxy for the artificial intelligence economy, and even for the graphics chip giant, it was too much to live up to. Nvidia earnings disappointed, but really, how could they not?...
4 hours ago - Alibaba Cloud, the cloud computing arm of China’s Alibaba Group Ltd., today announced the release of more than 100 new artificial intelligence large language models open source as part of the Qwen 2.5 family of models. Revealed at the...
1 month ago - Alibaba Group Holding Limited (NYSE:BABA) stock is trading lower Thursday after the company’s fiscal first-quarter print. The Chinese e-commerce juggernaut is fighting intense domestic e-commerce rivalry in a weak domestic economy. The...
1 month ago - (Bloomberg) -- Chinese investors finally being able to buy shares of Alibaba Group Holding Ltd. may provide a much-needed boost for the e-commerce firm’s stock, with an inflow of up to about $20 billion into next year.Most Read from...
1 month ago - (Reuters) -Alibaba Group Holding missed market expectations for first-quarter revenue on Thursday, as the company's domestic e-commerce sales came under pressure from cautious spending by Chinese consumers in a faltering economy. A...
Other stories
24 minutes ago - (Reuters) -Nike said on Thursday that former senior executive Elliott Hill will rejoin the company to succeed John Donahoe as president and CEO, as the sportswear giant shakes up its top rank amid efforts to revive sales and battle rising...
24 minutes ago - Trump maintains a roughly 60% stake in Trump Media & Technology Group, which trades on the Nasdaq under the ticker symbol "DJT."
24 minutes ago - FedEx and other transportation firms expanded operations during the pandemic-fueled online shipping boom. The company has been trying to cut billions in overhead costs after demand normalized. In June, FedEx completed a restructuring...
24 minutes ago - On CNBC's “Mad Money Lightning Round,” Jim Cramer said Wells Fargo & Company (NYSE:WFC) is going to go higher, adding that it's a “winner.” On Sept. 17, the San Francisco-based bank launched specialized Application Programming Interfaces...
24 minutes ago - Wall Street has absorbed the Fed's message that a deep cut will prove positive for the economy.