Skip to main content

Are AI Web Crawlers 'Destroying Websites' In Their Hunt for Training Data?

2 months 2 weeks ago
"AI web crawlers are strip-mining the web in their perpetual hunt for ever more content to feed into their Large Language Model mills," argues Steven J. Vaughan-Nichols at the Register. And "when AI searchbots, with Meta (52% of AI searchbot traffic), Google (23%), and OpenAI (20%) leading the way, clobber websites with as much as 30 Terabits in a single surge, they're damaging even the largest companies' site performance..." How much traffic do they account for? According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now comes from bots. Leading the way and growing fast? AI bots... Anyone who runs a website, though, knows there's a huge, honking difference between the old-style crawlers and today's AI crawlers. The new ones are site killers. Fastly warns that they're causing "performance degradation, service disruption, and increased operational costs." Why? Because they're hammering websites with traffic spikes that can reach up to ten or even twenty times normal levels within minutes. Moreover, AI crawlers are much more aggressive than standard crawlers. As the InMotionhosting web hosting company notes, they also tend to disregard crawl delays or bandwidth-saving guidelines and extract full page text, and sometimes attempt to follow dynamic links or scripts. The result? If you're using a shared server for your website, as many small businesses do, even if your site isn't being shaken down for content, other sites on the same hardware with the same Internet pipe may be getting hit. This means your site's performance drops through the floor even if an AI crawler isn't raiding your website... AI crawlers don't direct users back to the original sources. They kick our sites around, return nothing, and we're left trying to decide how we're to make a living in the AI-driven web world. Yes, of course, we can try to fend them off with logins, paywalls, CAPTCHA challenges, and sophisticated anti-bot technologies. You know one thing AI is good at? It's getting around those walls. As for robots.txt files, the old-school way of blocking crawlers? Many — most? — AI crawlers simply ignore them... There are efforts afoot to supplement robots.txt with llms.txt files. This is a proposed standard to provide LLM-friendly content that LLMs can access without compromising the site's performance. Not everyone is thrilled with this approach, though, and it may yet come to nothing. In the meantime, to combat excessive crawling, some infrastructure providers, such as Cloudflare, now offer default bot-blocking services to block AI crawlers and provide mechanisms to deter AI companies from accessing their data.

Read more of this story at Slashdot.

EditorDavid

What Happened When Unix Co-Creator Brian Kernighan Tried Rust?

2 months 2 weeks ago
"I'm still teaching at Princeton," 83-year-old Brian Kernighan recently told an audience at New Jersey's InfoAge Science and History Museums. And last month the video was uploaded to YouTube, a new article points out, "showing that his talk ended with a unique question-and-answer session that turned almost historic..." "Do you think there's any sort of merit to Rust replacing C?" one audience member asked... "Or is this just a huge hype bubble that's waiting to die down...?" '"I have written only one Rust program, so you should take all of this with a giant grain of salt," he said. "And I found it a — pain... I just couldn't grok the mechanisms that were required to do memory safety, in a program where memory wasn't even an issue!" Speaking of Rust, Kernighan said "The support mechanism that went with it — this notion of crates and barrels and things like that — was just incomprehensibly big and slow. And the compiler was slow, the code that came out was slow..." All in all, Kernighan had had a bad experience. "When I tried to figure out what was going on, the language had changed since the last time somebody had posted a description! And so it took days to write a program which in other languages would take maybe five minutes..." It was his one and only experience with the language, so Kernighan acknowledged that when it comes to Rust "I'm probably unduly cynical. "But I'm — I don't think it's gonna replace C right away, anyway." Kernighan was also asked about NixOS and HolyC — but his formative experiences remain rooted in Bell Labs starts in the 1970s, where he remembers it was "great fun to hang out with these people." And he acknowledged that the descendants of Unix now power nearly every cellphone. "I find it intriguing... And I also find it kind of irritating that underneath there is a system that I could do things with — but I can't get at it!" Kernighan answered questions from Slashdot readers in 2009 and again in 2015...

Read more of this story at Slashdot.

EditorDavid