pwshub.com

Major publishers sue Perplexity AI for scraping content

Major US news publishers Dow Jones & Co and NYP Holdings have sued AI search engine startup Perplexity for scraping their content without paying for it.

The lawsuit, filed on behalf of The Wall Street Journal and its sister tabloid New York Post by their parent company News Corporation, alleges two counts of copyright infringement and one of false designation of origin and dilution of trademarks. The plaintiffs accuse the AI biz of stealing the hard work of journalists to feed the data requirements of its training models. News Corp's CEO Robert Thomson claimed this could be the first of many such lawsuits against AI developers.

"The perplexing Perplexity has willfully copied copious amounts of copyrighted material without compensation, and shamelessly presents repurposed material as a direct substitute for the original source. Perplexity proudly states that users can 'skip the links' – apparently, Perplexity wants to skip the check," he told The Register in a statement.

"We applaud principled companies like OpenAI, which understands that integrity and creativity are essential if we are to realize the potential of Artificial Intelligence. Perplexity is not the only AI company abusing intellectual property and it is not the only AI company that we will pursue with vigor and rigor. We have made clear that we would rather woo than sue – but, for the sake of our journalists, our writers and our company, we must challenge the content kleptocracy."

News Corp isn't against sharing its intellectual property to train AI systems – but it wants the money upfront. In May it inked a deal with the aforementioned OpenAI for just this purpose, with a reported price tag over $250 million. The machine learning juggernaut also has similar deals in place with Reddit and Stack Overflow.

  • Everyone wants better web search – is Perplexity's AI the answer?
  • Cloudflare debuts one-click nuke of web-scraping AI
  • LinkedIn started harvesting people's posts for training AI without asking for opt-in
  • Meta back at it, harvesting Britons' public Facebook, Insta feeds for AI training

According to court documents [PDF] filed in the Southern District of New York District Court, News Corp first contacted Perplexity about the matter in July but received no response. It wants $150,000 for every proven infringement – which, if enforced, could severely impact or even bankrupt the startup.

The news giant also isn't just peeved at the data scraping itself, but also that Perplexity doesn't cite its sources. It claimed that Perplexity's AI "answer engine" can "skip the links" and that this deprives publishers of direct revenue. Even worse, it gets things wrong.

"In addition to using Plaintiffs' copyrighted work to develop a substitute product that reproduces or imitates Plaintiffs' original content, Perplexity also harms Plaintiffs' brands by falsely attributing to Plaintiffs certain content that Plaintiffs never wrote or published," the lawsuit claims.

"Not infrequently, if Perplexity is asked about what Plaintiffs’ publications reported, Perplexity 'answers' with false information. AI developers euphemistically call these factually incorrect outputs 'hallucinations.' Perplexity’s hallucinations can falsely attribute facts and analysis to content producers like Plaintiffs, sometimes citing an incorrect source, and other times simply inventing and attributing to Plaintiffs fabricated news stories."

One case cited is an August 2024 New York Post article about European attempts to "silence great Americans like Elon Musk." It claims Perplexity, when asked for a summary, copied the first 139 words of the piece, and then added five more paragraphs of factually incorrect information.

On the data scraping side, there is a mechanism for website operators to opt out of adding their content to the voracious maw of AI training databases: the robots.txt file, implemented by Google, OpenAI, and Cloudflare. While Perplexity CEO Aravind Srinivas has claimed his business does respect the do-not-scrape command, some third parties it uses might not be so ethical.

Perplexity had no comment at the time of going to press. ®

Source: theregister.com

Related stories
1 month ago - Judge says Musk critic "targeted" advertisers in Texas, denies motion to dismiss.
1 month ago - Traffic safety — We took a close look at the 23 most serious Waymo crashes. Enlarge / A Waymo vehicle in San...
1 month ago - WHISTLEBLOWER IN LEGAL CROSSHAIRS — Mayor said data was unusable to criminals; researcher proved...
1 month ago - Judges representing the Second Circuit Court of Appeals have ruled that the Internet Archive's (IA) practice of lending digital copies of books without licensing fees is not fair use. The decision is a major victory for the publishers...
1 week ago - Analysis of Denuvo DRM cracking shows significant impacts on publishers' bottom lines.
Other stories
13 minutes ago - How good are your defenses against election misinformation? Test your instincts to see how you compare to other Post readers.
40 minutes ago - More senior than Windows itself, and still runs the world Microsoft Excel, the true successor to the throne of COBOL. Version 1.0 was released on the last day of September 1985, four decades ago.…
48 minutes ago - Commentary: For working parents, every time saver helps. These smart home items make our lives a little easier.
49 minutes ago - Commentary: November is National Diabetes Month. Here are two pieces of wearable diabetes tech everyone should know about.
49 minutes ago - Today's top high-yield savings accounts still earn APYs more than 10 times the national average.