The Machines Are Hungry: AI Crawlers Now Account for Nearly Half of All Internet Traffic, and the Web Is Buckling

AI crawlers from OpenAI, Google, Anthropic, and others now generate 50 billion daily requests across Cloudflare's network, with traffic surging 757% in 2024. Small websites bear the brunt as bot swarms consume bandwidth, generate zero revenue, and ignore longstanding crawling conventions.
The Machines Are Hungry: AI Crawlers Now Account for Nearly Half of All Internet Traffic, and the Web Is Buckling
Written by Lucas Greene

The internet has a pest problem. Not the garden-variety spam bots or credential-stuffing scripts that have plagued websites for decades. This is something different β€” larger in scale, more relentless in appetite, and far more consequential for the architecture of the open web itself.

AI crawlers β€” the automated programs dispatched by companies like OpenAI, Anthropic, Google, Apple, Meta, and a growing roster of startups β€” now represent a staggering share of global web traffic. According to data from Cloudflare, one of the world’s largest web infrastructure companies, AI bot traffic surged 757% over the course of 2024. By early 2025, these crawlers were responsible for roughly 50 billion requests per day hitting sites protected by Cloudflare’s network, as Business Insider reported. That’s not a typo. Fifty billion. Every day.

And it’s accelerating.

The implications ripple outward in ways that most casual internet users will never see but that web operators, hosting companies, and content publishers are feeling acutely. Servers are straining. Bandwidth bills are climbing. Smaller websites are being knocked offline. The fundamental economics of running a website β€” already precarious for many publishers β€” are being rewritten by machines that consume content voraciously but generate zero pageviews, zero ad impressions, and zero revenue for the sites they scrape.

GoDaddy, which hosts more than 80 million domain names and serves as the backbone for millions of small businesses, told Business Insider that AI crawlers now account for fully one-third of all traffic to its hosted sites. For some individual sites, the figure is far higher. A small e-commerce shop or a niche blog might find that 80% or 90% of its traffic on a given day comes not from human customers but from bots methodically downloading every page, image, and data point they can find.

The sheer volume is punishing. Unlike a human visitor who might browse five or ten pages, an AI crawler will attempt to ingest an entire site in minutes. Multiply that by dozens of different AI companies, each running their own crawler, and the result is a distributed denial-of-service attack in all but name β€” except it’s perfectly legal, and the companies behind it are among the most valuable on Earth.

Who’s Crawling, and Why the Old Rules Don’t Apply

The biggest offenders, by volume, are also the biggest names. Cloudflare’s data identifies crawlers operated by Google, OpenAI, Anthropic, Meta, Apple, and ByteDance as among the most active. But the problem extends well beyond the major players. Hundreds of smaller AI companies have launched their own crawlers, many of which don’t identify themselves properly or respect the robots.txt conventions that have governed web crawling etiquette since the 1990s.

Robots.txt β€” a simple text file that website operators use to tell crawlers which parts of their site should and shouldn’t be accessed β€” was never a legal mechanism. It was a social contract. Google’s original search crawler respected it because Google had an incentive to maintain good relationships with webmasters: those webmasters were, after all, the source of Google’s index, and they benefited from search traffic in return. The arrangement was symbiotic.

AI crawlers have broken that symbiosis. When OpenAI’s GPTBot scrapes a publisher’s archive, the publisher gets nothing back. No link. No traffic. No attribution in most cases. The content is ingested into a training dataset or used to generate a response in ChatGPT, and the original source effectively disappears. The same is true for Anthropic’s crawler, Apple’s Applebot when used for AI training purposes, and Meta’s various data collection agents.

Some crawlers don’t even bother with robots.txt. Cloudflare found that a significant number of AI bots either ignore the file entirely or disguise themselves as regular web browsers to evade blocking. This cat-and-mouse dynamic has forced infrastructure providers to develop increasingly sophisticated detection tools. Cloudflare launched its “AI Audit” tool in 2024 specifically to help site owners identify and block unwanted AI crawlers. The company reported that when given the option, more than 80% of site operators chose to block AI bots.

That statistic alone tells you something about how the web’s content creators feel about this arrangement.

The financial toll is real and measurable. Bandwidth isn’t free. Server capacity isn’t free. Every request from an AI crawler consumes resources that the website operator pays for. For a large media company with robust infrastructure, the marginal cost of AI bot traffic might be manageable, if irritating. For a small business running on a shared hosting plan, it can mean degraded performance for actual customers, unexpected overage charges, or outright downtime.

GoDaddy has been particularly vocal about the impact on its customer base. The company has invested in new bot-management capabilities and has been working to give its customers β€” many of whom lack technical sophistication β€” simple tools to control AI crawler access. But the arms race is inherently asymmetric. The AI companies have vast engineering resources and strong financial incentives to keep crawling. The small business owner trying to sell handmade candles online does not have a security team.

The legal picture remains murky. Several major copyright lawsuits are working their way through federal courts, with publishers like The New York Times suing OpenAI over the use of copyrighted content in training data. But even if those cases result in favorable rulings for publishers, enforcement will be extraordinarily difficult. The crawling happens at such scale and speed that by the time a site owner realizes their content has been scraped, it’s already been absorbed into a model containing trillions of tokens of text.

Some companies have tried negotiation rather than litigation. A handful of major publishers β€” including the Associated Press, Axel Springer, and others β€” have struck licensing deals with OpenAI, granting access to their archives in exchange for payment. But these deals cover only a tiny fraction of the web. The vast majority of websites have no deal, no leverage, and no practical way to opt out.

Meanwhile, the AI companies show no signs of slowing down. If anything, the race to build larger and more capable models is intensifying demand for training data. OpenAI, Google, and Anthropic are all pursuing models that require ever-larger datasets, and the open web remains the richest and most accessible source of text, images, code, and structured data available. The crawlers will keep coming.

So where does this end? Some observers see a future in which large portions of the web simply go dark β€” hidden behind paywalls, login walls, or aggressive bot-blocking measures that also degrade the experience for legitimate users. Others predict a licensing regime will eventually emerge, similar to how the music industry developed mechanical royalties and performance rights organizations to compensate creators. But the internet has never had a ASCAP or BMI equivalent, and building one from scratch for a medium this vast and decentralized would be an enormous undertaking.

There’s also a deeper irony at work. The AI models being trained on this data are increasingly being used to generate content that competes directly with the sources they scraped. A travel blog that spent years building a library of destination guides now competes with ChatGPT, which can produce a passable imitation of that content in seconds β€” using, in part, the blog’s own words as training fuel. The machine eats the web, then replaces it.

For now, the numbers keep climbing. Cloudflare’s 757% growth figure covers just one year. If the trajectory holds β€” and there’s every reason to believe it will, given the billions of dollars flowing into AI development β€” the proportion of internet traffic generated by machines rather than humans will continue to grow. The web was built for people. Increasingly, it’s being consumed by something else entirely.

And the people who built it are left with the bill.

Subscribe for Updates

DevWebPro Newsletter

The DevWebPro Email Newsletter is a must-read for web and mobile developers, designers, agencies, and business leaders. Stay updated on the latest tools, frameworks, UX trends, and best practices for building high-performing websites and apps.

By signing up for our newsletter you agree to receive content related to ientry.com / webpronews.com and our affiliate partners. For additional information refer to our terms of service.

Notice an error?

Help us improve our content by reporting any issues you find.

Get the WebProNews newsletter delivered to your inbox

Get the free daily newsletter read by decision makers

Subscribe
Advertise with Us

Ready to get started?

Get our media kit

Advertise with Us