Cloudflare nabízí blokování AI botů na jeden klik:
A toto je velmi milý generátor bludišť pro AI boty:
https://zadzmo.org/code/nepenthes/
Generuje nekonečné bludiště pomalu se načítajících stránek, kde si ti boti můžou rejdit do skonání světa.
Další, kdo sponzoruje zisky AI firem vlastní krví:
Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale. (…) If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned, including expensive endpoints like git blame, every page of every git log, and every commit in every repo, and they do so using random User-Agents that overlap with end-users and come from tens of thousands of IP addresses – mostly residential, in unrelated subnets, each one making no more than one HTTP request over any time period we tried to measure – actively and maliciously adapting and blending in with end-user traffic and avoiding attempts to characterize their behavior or block their traffic.
Už jsem tu odkazoval na „firewall“ proti AI botům od Cloudflare, teď uvedli i bludiště:
https://blog.cloudflare.com/ai-labyrinth/
When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity, without the need for customers to create any custom rules. AI Labyrinth is available on an opt-in basis to all customers, including the Free plan.
Zmiňují, že v jejich síti jinak boti udělají přes 50 miliard dotazů denně
Dobrý přehled od Ars Technica:
Maintainer curlu to na Mastodonu komentuje, že webu curlu mu aktuálně bere 77 TB měsíčně
Žádná z těch AI firem se k textu nevyjádřila.
Wikipedia dostává též slušně zabrat:
Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%. This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models. Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.
Pohnout trafficem Wikipedie (!) o polovinu je dobrý masakr.