pwshub.com

Cloudflare tightens screws on site-gobbling AI bots

Cloudflare on Monday expanded its defense against the dark arts of AI web scrapers by providing customers with a bit more visibility into, and control over, unwelcome content raids.

The network biz earlier this year deployed a one-click AI bot defense to improve upon the not very effective robots.txt mechanism, a way websites can ask, but not require, bots to behave.

Cloudflare is now upgrading its arsenal with an AI Audit control panel.

The idea is to provide customers with analytics data about crawlers that harvest data for AI training and inference so better decisions can be made about whether to embrace the bots or turn them away.

"Some customers have already made decisions to negotiate deals directly with AI companies," explained Sam Rhea, a member of Cloudflare's emerging technology and incubation team. "Many of those contracts include terms about the frequency of scanning and the type of content that can be accessed. We want those publishers to have the tools to measure the implementation of these deals."

Rhea says the problem is that the emergence of AI bots has made it more complicated to determine whether programmatic access to a website is beneficial or abusive. While they're not conducting a denial of service attack, bots that capture site data to train AI models or serve AI search results can still present a business threat.

"AI Data Scraper bots scan the content on your site to train new LLMs," said Rhea. "Your material is then put into a kind of blender, mixed up with other content, and used to answer questions from users without attribution or the need for users to visit your site."

  • Cards Against Humanity deals SpaceX a $15M lawsuit over Texas turf tangle
  • IBM quietly axing thousands of jobs, source says
  • Heart of glass: Human genome stored for 'eternity' in 5D memory crystal
  • No way? Big Tech's 'lucrative surveillance' of everyone is terrible for privacy, freedom

As software developer Simon Willison has described it, AI training is akin to "money laundering for copyrighted data." Because companies like OpenAI and Anthropic do not disclose the training data used to create their models, AI is essentially content laundering. It's similar to a crypto mixer – a process intended to disguise the provenance of cryptocurrency.

Then, there are AI Search Crawler bots that scan content and cite it back in response to search queries. "The downside is that those users might just stay inside of that interface, rather than visit your site, because an answer is assembled on the page in front of them," said Rhea.

That is to say, AI search may not drive traffic to source websites, and thus doesn't provide ad revenue. The issue came up over the summer when iFixit CEO Kyle Wiens objected to data harvesting by Anthropic's crawlers, a situation the AI firm has since addressed.

Rhea argues that allowing AI bots to run rampant threatens the open internet.

"Without the ability to control scanning and realize value, site owners will be discouraged to launch or maintain Internet properties," he said. "Creators will stash more of their content behind paywalls and the largest publishers will strike direct deals. AI model providers will in turn struggle to find and access the long tail of high-quality content on smaller sites."

Enter Cloudflare's AI Audit control panel. The network biz believes companies can use the provided bot analytics to monitor content access deals with AI firms, which it claims are becoming more common, and enforce policies rather than trusting crawlers to obey robots.txt directives. ®

Source: theregister.com

Related stories
4 days ago - X switch to Cloudflare revived site in Brazil until Cloudflare isolated X traffic.
13 hours ago - Cloudflare may charge an app store-like fee for its AI-scraping data marketplace.
1 month ago - Getty Images A familiar debate is once again surrounding Cloudflare, the content delivery network that provides a free service that protects...
1 month ago - Immerse is Cloudflare’s premier annual conference in Southeast Asia Partner Content Cloudflare is excited to present Immerse, our flagship event designed to connect attendees directly with the ideas, technologies and business leaders...
1 month ago - Explore the latest trends in cybersecurity with expert insight from Cloudflare Webinar  As cyber threats grow more sophisticated, staying informed is crucial for IT professionals.…
Other stories
28 minutes ago - Commentary: Your watch already tracks heart rate, but turning on this view helps you adjust your workouts in real time. Here's how to do it.
49 minutes ago - How to protect personal data Partner Content  For people who haven't personally experienced them, terms like data leak or data breach may seem unfamiliar and foreign - much like visiting a new destination abroad.…
2 hours ago - Need protein in a pinch? Here's a quick and easy cooking hack to perfectly make poached eggs.
2 hours ago - Security is key, especially when it comes to keeping your home and everyone and everything in it secure. So check out our picks for the best indoor security cameras, tested by our very own CNET experts.
4 hours ago - But work on big ticket items needed to make the combo a contender is yet to commence The effort to bring the Xen hypervisor to the RISC-V instruction set architecture has advanced – a little – but big jobs that would make both projects a...