I really hope they die soon, this is unbearable…

  • punrca@piefed.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    3
    ·
    2 months ago

    It’s best to use either Cloudflare (best IMO) or Anubis.

    1. If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis

    2. Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt

    Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file.

    • shane@feddit.nl
      link
      fedilink
      English
      arrow-up
      24
      arrow-down
      7
      ·
      2 months ago

      If you’re relying on Cloudflare are you even self-hosting?

    • AHemlocksLie@lemmy.zip
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      2 months ago

      Pretty sure I’ve repeatedly heard about the crawlers completely ignoring robots.txt, so does Cloudflare really do that much?

      • Sv443@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        1
        ·
        2 months ago

        Like a lock on a door, it stops the vast majority but can’t do shit about the actual professional bad guys

        • FreedomAdvocate
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          Cloudflare definitely can and does stop the vast majority of actual professional bad guys.

      • tomjuggler@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        2 months ago

        Yes, CloudFlare blocks agents completely if they ignore it’s restrictions. The key is scale - CloudFlare has a birds eye view of traffic patterns across millions of sites and can do statistical analysis to determine who is a bot.

        I hate the necessity but it works