hylobates@jlai.lu to Selfhosted@lemmy.worldEnglish · 14 hours agoBased on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersjlai.luimagemessage-square42fedilinkarrow-up1364arrow-down15file-text
arrow-up1359arrow-down1imageBased on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersjlai.luhylobates@jlai.lu to Selfhosted@lemmy.worldEnglish · 14 hours agomessage-square42fedilinkfile-text
minus-squarepunrca@piefed.worldlinkfedilinkEnglisharrow-up7·2 hours agoIt’s best to use either Cloudflare (best IMO) or Anubis. If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file.
It’s best to use either Cloudflare (best IMO) or Anubis.
If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis
Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt
Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file.