• Blackmist@feddit.uk
    link
    fedilink
    English
    arrow-up
    38
    ·
    3 hours ago

    It’s another move to protect against AI scraping that isn’t paying them for access.

  • conorab@lemmy.conorab.com
    link
    fedilink
    English
    arrow-up
    127
    ·
    6 hours ago

    As somebody who often ends up using Reddit like Stackoverflow and in some cases needing the Internet Archive (IA) to find the original post after it’s been deleted or garbled, I think this is a wakeup call for those go to Reddit both to get technical help and to post it. More than ever, Reddit is becoming an unreliable place to find answers for old obscure issues and if they are going to lockout places like the IA then I think it’s time people stopped contributing their solutions to Reddit.

    • mojofrododojo@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      2 hours ago

      yup. continuing to feed them traffic after their repeated attacks on the userbase is just sad. stop using them. yeah it sucks the info is gone, but acting like they’ll wake up and change is absurd.

    • cashsky@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      2
      ·
      4 hours ago

      Searching anywhere in general is getting shittier and shittier by day. Web searches are riddled with hallucinated AI generated garbage pages. Finding the right answer for difficult problems is getting worse and worse. We are sliding rapidly into Idiocracy.

    • NauticalNoodle@lemmy.ml
      link
      fedilink
      English
      arrow-up
      10
      ·
      3 hours ago

      When I joined Lemmy I decided it was unwise to trust anything on Reddit less than a year old. Now it’s anything under two years old.

  • JakenVeina@midwest.social
    link
    fedilink
    English
    arrow-up
    30
    ·
    6 hours ago

    The company says that AI companies have scraped data from the Wayback Machine, so it’s going to limit what the Wayback Machine can access.

    Yeah, wouldn’t want those AI companies to get all that data for free. Gotta make 'em pay for it.

  • DFX4509B@lemmy.org
    link
    fedilink
    English
    arrow-up
    48
    ·
    edit-2
    8 hours ago

    Just more vindication for my ditching that trash heap of a platform. YT is probably going to be the next platform I ditch as they’re going full Reddit now.

    It’s a matter of time before third-party YT front-ends start getting throttled or outright blocked like third-party Reddit front-ends.

      • DFX4509B@lemmy.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        2 hours ago

        And Invidious while being logged out of YT while that’s still an option, but I have both a PeerTube and Odysee set up already.

        I seem to have the best luck with the inv.nadeko.net instance and to a lesser extent the invidious.nerdvpn.de instance, and both instances proxy by default.

    • Someonelol@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      7
      ·
      7 hours ago

      YouTube’s already throttling users in their mobile site. They have these massive channel cards in their feeds and the video titles/thumbnails disappear after a few offerings, leaving you with the ability to blindly click on a video.

      • DFX4509B@lemmy.org
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 hours ago

        I’ve declared my YT channel to be dormant starting on the 13th due to this AI age-gating crap.

        • thisbenzingring@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          6 hours ago

          yes, in a way. this benzene ring

          there was a band called Hum and in one of my favorite songs of theirs called The Scientists, the song talks about a couple who are scientists and creating and experimenting with drugs.

          she tells him to keep this benzene ring around your finger, and think of me when everything you ever wanted is about to end

          i fucking love that song but that moment in the song is just peak layers upon layers of music and poetry and love and adventure.

          https://youtu.be/7IPDsUGBv64

          • YiddishMcSquidish@lemmy.today
            link
            fedilink
            English
            arrow-up
            2
            ·
            51 minutes ago

            Core memory unlocked! I remember catching a couple mix demo cds the 01 warped tour and Hum’s stars was on it. I actually green album with the zebra on it shortly after.

          • rhythmisaprancer@piefed.social
            link
            fedilink
            English
            arrow-up
            5
            ·
            5 hours ago

            there was a band called Hum

            Wow, what a memory trip! I listened to that song, I don’t think I have heard it before, but it is great! I’m pretty sure I heard a different song from them at the time, but they probably live in my mind from looking at BMG and CD warehouse catalogs at the time. Other artists have popped up over the years from there.

            I’m glad I asked, and thanks for answering! Somehow that took me back to my Candlebox days.

              • rhythmisaprancer@piefed.social
                link
                fedilink
                English
                arrow-up
                3
                ·
                5 hours ago

                Ya that was the one , i listened to it, also. The very beginning sounded very familiar but I’m not sure about the rest. But maybe it’s been 30 years and, well 🤷‍♂️ I never saw Candlebox live, guess I didn’t miss out. I really liked Alice in Chains but only got to see Cantrell tour while “waiting”

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    170
    ·
    11 hours ago

    Given that the Internet Archive is the de facto standard way to cite material as seen on a given date — they’re a trustworthy party that will probably persist for a long time — that’s going to make it harder to cite content on Reddit.

    • Deceptichum@quokk.au
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      7 hours ago

      Damn, guess if you want reddit data to train your AI that you’ll need to pay Spez for access.

      • PastafARRian@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        57 minutes ago

        Don’t forget, Reddit is legally allowed to train on your content, but not the other way around. It’s consistent with US law, where corporate tax is half of income tax.

      • tal@lemmy.today
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        4 hours ago

        It’s important for people writing papers and such who need to cite material.

        I wonder if there’s some way to use the TLS certificate to get a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don’t know if existing TLS libraries are capable of that. Like, Web browser menu option “Store cryptographically-signed webpage”. Absent a later certificate compromise, I’d think that that’d at least provide people a way to credibly say “this is really what was on that webpage on August 15th, 2026”. Like, you’d have to save a copy of the TLS session and then have libraries that could read and validate an already-generated session. The timestamp is already embedded in the session.

        Some protocols, like OTR, are designed to specifically not allow that, but AFAIK, TLS could.

        EDIT: Well, technically the timestamp is gonna be during the handshake, not tied to the HTTP request internal to the TLS session. It might be possible to game that by establishing a TLS session, holding it open without activity, and issuing a request much later. I’d think that that’d potentially be disallowed by Web servers one way or another, since otherwise you could probably do a denial-of-service attack by holding open a lot of sessions for a long time.

        EDIT2: Oh, wait, no, shouldn’t be an issue, because the HTTP Date response header is gonna have a timestamp tied to the response.

  • Em Adespoton@lemmy.ca
    link
    fedilink
    English
    arrow-up
    16
    ·
    8 hours ago

    OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.

    Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.

        • yeehaw@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          ·
          2 hours ago

          And you think reddit actually deletes it? Risk data loss? All that valuable data? No way. They might shadow delete it, but it’s there forever.

    • xthexder@l.sw0.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 hours ago

      I’m assuming IA will continue to host their historical archives of Reddit, they’ll just not have any new captures after this. Unless IA has said otherwise, it’d be very strange to wipe their archive of Reddit

  • HexesofVexes@lemmy.world
    link
    fedilink
    English
    arrow-up
    34
    arrow-down
    1
    ·
    9 hours ago

    Oh no, someone might not be paying them for their user generated content (!)

    To be fair, it’s probably best that history forgets this period of the web…

    • hamFoilHat@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      10 hours ago

      It is my understanding that if you block the wayback machine from indexing your site it will also delist the history as well.

      • Jason2357@lemmy.ca
        link
        fedilink
        English
        arrow-up
        38
        ·
        9 hours ago

        They do archive sites against the owners wishes when they consider it an important site for public archiving, like some news sites. They are in no obligation to delete the archives and hope they don’t.

        • tal@lemmy.today
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          6 hours ago

          Parties have archived the data from pushshift, which cover a lot of Reddit history.

          kagis

          https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

          Subreddit comments/submissions 2005-06 to 2024-12

          This is the top 40,000 subreddits from reddit’s history in separate files. You can use your torrent client to only download the subreddit’s you’re interested in.

          I mean, that won’t have the past half year or some low-traffic subreddits, but…

      • Natanael@infosec.pub
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 hours ago

        The ability to block crawling is separate from the ability to delist old pages. The latter usually happens after domains change owners