pwshub.com

Cloudflare debuts tools for website owners to charge AI companies that scrape their content

Earlier this year, Cloudflare Inc. announced a simple tool for website owners to prevent artificial intelligence model developers from scraping their online content. Now, it’s building on that with additional capabilities that can help website owners to control how their content is used by AI models, and even try to make money from it.

The company said its AI Audit product provides a suite of tools to help customers understand how AI models are using their content. Once they know what their content is being used for, they’ll then be able to decide if they’re willing to let AI developers access it or not. Moreover, they’ll also be able to set what they consider is a “fair price” for AI scrapers to use their content for model training and other purposes.

The practice of scraping websites for content has become extremely common in the AI industry, with the internet providing a treasure trove of ostensibly “free” data that can be used to train AI models. But this mass scraping of websites is controversial too, with many content creators and publishers arguing that it’s unfair, especially since they’re unaware it’s happening.

The biggest AI providers today are all guilty of scraping content from the web, including the likes of OpenAI, Google LLC, Meta Platforms Inc., Stability AI Ltd., IBM Corp. and Microsoft Corp. These companies all openly admit to helping themselves to publishers’ content, arguing that the practice falls under the “fair use” doctrine.

But critics say that it’s having a detrimental impact on publishers, since they lose out on web traffic as a result of having their content scraped. For example, a website that posts food recipes will lose a ton of traffic – and potential revenue – to AI chatbots that use their content to quickly respond to requests for a recipe. Because the chatbot provides the user with all of the information they’ve asked for, there’s little incentive for anyone to actually go visit that website, even if the chatbot cites it as the source of its response.

Some publishers have responded to this by taking steps to block AI developers from accessing their websites. Last month, the Guardian reported that The New York Times, CNN, Reuters and the Chicago Tribune had all blocked OpenAI’s GPTBot web crawler from scanning their websites.

Meanwhile, others have countered by enabling AI developers to access their content for a price. Reddit Inc., one of the world’s busiest forums, said in April it is launching an application programming interface that will enable AI companies to pay to access its content, ensuring it is fairly compensated.

Giving control back to creators

With its latest update today, Cloudflare says, it’s helping every website developer to do something similar. AI Audit is designed to give control back to content creators, so there can be a more transparent exchange between the two parties.

It includes a simple, one-click tool that automatically prevents every kind of AI scraper from accessing their content, plus a suite of analytics tools that can help website owners to understand what AI bots are doing on their properties. According to Cloudflare, it can help site owners to understand why, when and how often AI models are accessing their web pages, and even make a distinction between AI bots that credit the source of their data and those that don’t.

In addition, Cloudflare’s AI Audit also provides a tool for website owners to determine a fair price for allowing bots to access their content, based on the standard going rates negotiated by bigger publishers such as Reddit. Cloudflare says this is necessary because many smaller site owners lack the resources and expertise to understand the value of their content and negotiate deals with AI companies. Moreover, the AI companies themselves simply don’t have the bandwidth to cut a deal with every single website they scrape, because there are millions of them.

Cloudflare’s AI Audit tab helps to define the metrics that are commonly used to establish a fair price for scraping, such as the rate of crawling for certain sections of content of an entire page or website. Based on this data, it will then recommend a price and transaction flow. That enables AI developers quickly find new sources of content and pay for them, compensating the creators.

Cloudflare co-founder and Chief Executive Matthew Prince said AI will forever transform the way people interact with content online, so it’s necessary for every stakeholder to get together and determine what this future will look like. But he believes it’s important for content creators to be able to own and control their content.

“If content creators don’t have this control, the quality of online information will deteriorate or be locked exclusively behind paywalls,” Prince said. “With Cloudflare’s scale and global infrastructure, we believe we can provide the tools and set the standards to give websites, publishers, and content creators control and fair compensation for their contribution to the Internet, while still enabling AI model providers to innovate.”

Source: siliconangle.com

Related stories
1 week ago - This week brought yet another big shakeup at OpenAI, as Chief Technology Officer Mira Murati and others quit. But CEO Sam Altman seems to be cementing his control. And Chief Financial Officer Sarah Friar said in a memo that OpenAI’s...
2 weeks ago - Content delivery network provider Cloudflare Inc. today announced that it’s making key security tools, including its zero-trust platform Cloudflare One, available for free. The free tool offering is focused on giving security teams the...
1 month ago - Wall Street analysts are forecasting rapid sales growth for Shopify and Cloudflare.
1 month ago - Phishing protection company SlashNext Inc. today announced the launch of Project Phantom, a new virtual stealth mode browser that offers advanced URL analysis and threat detection to its customers. Project Phantom is being offered...
1 month ago - Non-human identity and access management company Aembit Inc. announced today that it had raised $25 million in new funding to further advance its non-human IAM solutions. Founded in 2021, Ambient offers a non-human IAM platform that...
Other stories
50 minutes ago - Jeff Bezos’ Amazon.com Inc. (NASDAQ:AMZN) once nearly ran out of cash 24 years ago, during the dot-com bubble crisis of 2000. It was once reportedly left with just 10 months of cash, with its stock price plunging to $7 from $107. Fast...
50 minutes ago - The company's latest institutional shareholder has made some troubling allegations against it.
1 hour ago - (Bloomberg) -- Chinese stocks fell in early trade, underperforming their Asian peers as caution grows ahead of a key weekend briefing that may shed more light on Beijing’s fiscal stimulus. Most Read from BloombergThe Cablebus Transformed...
1 hour ago - Tastytrade co-founder Tom Sosnoff argues that investors ought to consider going against the passive grain and investing actively. But he isn't using returns in his argument.
2 hours ago - The U.S. and U.K. governments today announced they’re creating a joint children’s online safety working group in an effort to find “common solutions” to the issue of child safety online. “As more children across the U.S. and around the...