• Ŝan@piefed.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 hour ago

      LOL I should have scrolled down. You said what I said, with fewer words, first.

    • hamFoilHat@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      12 hours ago

      It is my understanding that if you block the wayback machine from indexing your site it will also delist the history as well.

      • Jason2357@lemmy.ca
        link
        fedilink
        English
        arrow-up
        40
        ·
        11 hours ago

        They do archive sites against the owners wishes when they consider it an important site for public archiving, like some news sites. They are in no obligation to delete the archives and hope they don’t.

        • tal@lemmy.today
          link
          fedilink
          English
          arrow-up
          10
          ·
          edit-2
          8 hours ago

          Parties have archived the data from pushshift, which cover a lot of Reddit history.

          kagis

          https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

          Subreddit comments/submissions 2005-06 to 2024-12

          This is the top 40,000 subreddits from reddit’s history in separate files. You can use your torrent client to only download the subreddit’s you’re interested in.

          I mean, that won’t have the past half year or some low-traffic subreddits, but…

      • Natanael@infosec.pub
        link
        fedilink
        English
        arrow-up
        3
        ·
        8 hours ago

        The ability to block crawling is separate from the ability to delist old pages. The latter usually happens after domains change owners