Pozor na ClaudeBot a jiné agresivní AI boty

zoul · leden 5, 2025, 3:49odp.

Cloudflare nabízí blokování AI botů na jeden klik:

https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

zoul · leden 15, 2025, 11:56am

A toto je velmi milý generátor bludišť pro AI boty:

https://zadzmo.org/code/nepenthes/

Generuje nekonečné bludiště pomalu se načítajících stránek, kde si ti boti můžou rejdit do skonání světa.

zoul · březen 20, 2025, 12:34odp.

Další, kdo sponzoruje zisky AI firem vlastní krví:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale. (…) If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned, including expensive endpoints like git blame, every page of every git log, and every commit in every repo, and they do so using random User-Agents that overlap with end-users and come from tens of thousands of IP addresses – mostly residential, in unrelated subnets, each one making no more than one HTTP request over any time period we tried to measure – actively and maliciously adapting and blending in with end-user traffic and avoiding attempts to characterize their behavior or block their traffic.

zoul · březen 23, 2025, 6:13am

Už jsem tu odkazoval na „firewall“ proti AI botům od Cloudflare, teď uvedli i bludiště:

https://blog.cloudflare.com/ai-labyrinth/

When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity, without the need for customers to create any custom rules. AI Labyrinth is available on an opt-in basis to all customers, including the Free plan.

Zmiňují, že v jejich síti jinak boti udělají přes 50 miliard dotazů denně

zoul · březen 26, 2025, 11:28am

Dobrý přehled od Ars Technica:

Maintainer curlu to na Mastodonu komentuje, že webu curlu mu aktuálně bere 77 TB měsíčně

Žádná z těch AI firem se k textu nevyjádřila.

zoul · duben 2, 2025, 5:06odp.

Wikipedia dostává též slušně zabrat:

Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%. This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models. Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Pohnout trafficem Wikipedie (!) o polovinu je dobrý masakr.

zoul · duben 20, 2025, 9:57am

Wow! A kdyby vás zajímalo, jak ti boti můžou operovat s takovou hromadou rezidenčních IP adres, které nejde snadno blokovat:

Evidentně existujou služby, které platí vývojářům aplikací za to, že do své aplikace přibalí kus jejich kódu. Takhle vznikne botnet z uživatelů těch aplikací, který pak služba prodává dál.