pwshub.com

Is China pulling ahead in AI video synthesis? We put Minimax to the test

Skip to content

With China's AI video generators pushing memes into weird territory, it was time to test one out.

A still shot from an AI-generated Minimax video-01 video with the prompt: "A highly-intelligent person reading 'Ars Technica' on their computer when the screen explodes"

A still shot from an AI-generated Minimax video-01 video with the prompt: "A highly-intelligent person reading 'Ars Technica' on their computer when the screen explodes" Credit: Minimax

If 2022 was the year AI image generators went mainstream, 2024 has arguably been the year that AI video synthesis models exploded in capability. These models, while not yet perfect, can generate new videos from text descriptions called prompts, still images, or existing videos. After OpenAI made waves with Sora in February, two major AI models emerged from China: Kuaishou Technology's Kling and Minimax's video-01.

Both Chinese models have already powered numerous viral AI-generated video projects, accelerating meme culture in weird new ways, including a recent shot-for-shot translation of the Princess Mononoke trailer using Kling that inspired death threats and a series of videos created with Minimax's platform. The videos show a synthesized version of TV chef Gordon Ramsay doing ridiculous things.

After 22 million views and thousands of death threats, I felt like I needed to take this post down for my own mental health.
This trailer was an EXPERIMENT to show my 300 friends on X how far we've coming in 16 months.
I'm putting it back up to keep the conversation going. 🧵 pic.twitter.com/tFpRPm9BMv

— PJ Ace (@PJaccetturo) October 8, 2024

Kling first emerged in June, and it can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherency that some think surpasses Sora. It's currently only available to people with a Chinese telephone number, and we have not yet used it ourselves.

Around September 1, Minimax debuted the aforementioned video-01 as part of its Hailuo AI platform. That site lets anyone generate videos based on a prompt, and initial results seemed similar to Kling, so we decided to run some of our Runway Gen-3 prompts through it to see what happens.

Putting Minimax to the test

We generated each of the 6-second-long 720p videos seen below using Minimax's free Hailuo AI platform. Each video generation took up to five to 10 minutes to complete, likely due to being in a queue with other free video users. (At one point, the whole thing froze up on us for a few days, so we didn't get a chance to generate a flaming cheeseburger.)

In the spirit of not cherry-picking any results, everything you see was the first generation we received for the prompt listed above it.

"A highly intelligent person reading 'Ars Technica' on their computer when the screen explodes"

"A cat in a car drinking a can of beer, beer commercial"

"Will Smith eating spaghetti"

"Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens"

"A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts"

"A herd of one million cats running on a hillside, aerial view"

"Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy"

"A muscular barbarian breaking a CRT television set with a weapon, cinematic, 8K, studio lighting"

Limitations of video synthesis models

Overall, the Minimax video-01 results seen above feel fairly similar to Gen-3's outputs, with some differences, like the lack of a celebrity filter on Will Smith (who sadly did not actually eat the spaghetti in our tests), and the more realistic cat hands and licking motion. Some results were far worse, like the one million cats and the Ars Technica reader.

As we explained in our hands-on test for Runway's Gen-3 Alpha, text-to-video models typically excel at combining concepts present in their training data (existing video samples used to create the model), allowing for creative mashups of existing themes and styles. However, these AI models often struggle with generalization, meaning they have difficulty applying learned information to entirely novel scenarios not represented in their training data.

This limitation can lead to unexpected or unintended results when users request scenarios that deviate too far from the model's training examples. While we saw a very comical result for the cat drinking beer in the Gen-3 test, Minimax rendered a more realistic-looking result, and that could come down to better parsing of the prompt, different training data, more compute in training the model, or a different model architecture. Ultimately, there's still a lot of trial and error in generating a coherent result.

It's worth noting that while China's models seem to match US video synthesis models from earlier this year, American tech companies aren't standing still. Google showed off Veo in May with some very impressive-looking demos. And last week, we reported on Meta's Movie Gen model, which appears (without using Meta's model ourselves) to potentially be a step ahead of Minimax and Kling. But China's servers are doubtlessly cranking away at training new AI video models as we speak, so this deepfake arms race probably won't slow down any time soon.

Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

  1. Listing image for first story in Most Read: Winamp really whips open source coders into frenzy with its source release

    1. Winamp really whips open source coders into frenzy with its source release

  2. 2. Two never-before-seen tools, from same group, infect air-gapped devices

  3. 3. Bankrupt Fisker says it can’t migrate its EVs to a new owner’s server

  4. 4. Reports: China hacked Verizon and AT&T, may have accessed US wiretap systems

  5. 5. We’re finally going to the Solar System’s most intriguing but unexplored frontier

Source: arstechnica.com

Related stories
1 month ago - The business outcomes of the AT&T and Microsoft cases bode poorly for the internet giant if a judge calls for stern measures up to and including a breakup.
1 week ago - Alleges Korean giant's app store lockdown is no accident, and anticompetitive Epic Games has launched another lawsuit in pursuit of its goal of selling its apps direct rather than through platform owners' app stores – this time placing...
4 days ago - As the tech industry pushes to build more giant facilities to power AI and the digital world, some communities are trying to block the projects — and winning.
1 month ago - Frustrated at work? This neurodivergent-focused AI platform can help ease your heightened emotional state and provide suggestions to make emails a little less feisty.
1 week ago - If you find yourself frustrated at work, this AI platform can provide suggestions to make your emails a little less feisty before you hit send.
Other stories
14 minutes ago - Breathe easier on the last day of Amazon's Prime Day event, with up to 40% off Molekule Air Purifiers. Amazon Prime members can qualify for even deeper discounts.
14 minutes ago - These still-live Prime Day deals can help you optimize your work-from-home setup while saving you big bucks.
14 minutes ago - The best water filter leave your water tasting crisp and clean. Our favorite water filter pitchers from ZeroWater do just that -- and they're 30% off for the last few hours of Prime Day.
14 minutes ago - Amazon's Prime Day ends in a few hours. Don't sleep on this smart TV deal from Best Buy.
14 minutes ago - Save 20% on endless ice cream, gelato, smoothies and more with this Prime Day deal.