Reddit will block the Internet Archive

General_Effort@lemmy.world · 11 months ago

Reddit will block the Internet Archive

RustyShackleford@literature.cafe · 11 months ago

thisbenzingring@lemmy.sdf.org · 11 months ago

lol i think that might be the worst/best thing I have seen in a long time

rhythmisaprancer@piefed.social · 11 months ago

Unrelated but is your username a play on benzene?

thisbenzingring@lemmy.sdf.org · edit-2 11 months ago

yes, in a way. this benzene ring

there was a band called Hum and in one of my favorite songs of theirs called The Scientists, the song talks about a couple who are scientists and creating and experimenting with drugs.

she tells him to keep this benzene ring around your finger, and think of me when everything you ever wanted is about to end

i fucking love that song but that moment in the song is just peak layers upon layers of music and poetry and love and adventure.

https://youtu.be/7IPDsUGBv64

rhythmisaprancer@piefed.social · 11 months ago

there was a band called Hum

Wow, what a memory trip! I listened to that song, I don’t think I have heard it before, but it is great! I’m pretty sure I heard a different song from them at the time, but they probably live in my mind from looking at BMG and CD warehouse catalogs at the time. Other artists have popped up over the years from there.

I’m glad I asked, and thanks for answering! Somehow that took me back to my Candlebox days.

thisbenzingring@lemmy.sdf.org · edit-2 11 months ago

their big hit was called Stars

https://youtu.be/gMEB4HNNZ2I

oh yeah, I listened to Candelbox. They didn’t put on a good live show sadly

rhythmisaprancer@piefed.social · 11 months ago

Ya that was the one , i listened to it, also. The very beginning sounded very familiar but I’m not sure about the rest. But maybe it’s been 30 years and, well 🤷‍♂️ I never saw Candlebox live, guess I didn’t miss out. I really liked Alice in Chains but only got to see Cantrell tour while “waiting”

thisbenzingring@lemmy.sdf.org · 11 months ago

I never got to see AiC either 😩

YiddishMcSquidish@lemmy.today · 11 months ago

Core memory unlocked! I remember catching a couple mix demo cds the 01 warped tour and Hum’s stars was on it. I actually green album with the zebra on it shortly after.

Lka1988@lemmy.dbzer0.com · 11 months ago

Lawnman23@lemmy.world · 11 months ago

fuck spez

YiddishMcSquidish@lemmy.today · 11 months ago

Cuck boy getting pegged by post top op Garfield is definitely not something I had jotted down in my day-at-a-glance.

phutatorius@lemmy.zip · 11 months ago

I would have at least expected him to ask Spez to put some lasagna on his bumhole as lube.

mesa@piefed.social · 11 months ago

Art.

finix_the_psyker@sopuli.xyz · 11 months ago

What a terrible day to have eyes.

tal@lemmy.today · 11 months ago

Given that the Internet Archive is the de facto standard way to cite material as seen on a given date — they’re a trustworthy party that will probably persist for a long time — that’s going to make it harder to cite content on Reddit.

lumpenproletariat@quokk.au · 11 months ago

Damn, guess if you want reddit data to train your AI that you’ll need to pay Spez for access.

tal@lemmy.today · edit-2 11 months ago

It’s important for people writing papers and such who need to cite material.

I wonder if there’s some way to use the TLS certificate to get a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don’t know if existing TLS libraries are capable of that. Like, Web browser menu option “Store cryptographically-signed webpage”. Absent a later certificate compromise, I’d think that that’d at least provide people a way to credibly say “this is really what was on that webpage on August 15th, 2026”. Like, you’d have to save a copy of the TLS session and then have libraries that could read and validate an already-generated session. The timestamp is already embedded in the session.

Some protocols, like OTR, are designed to specifically not allow that, but AFAIK, TLS could.

EDIT: Well, technically the timestamp is gonna be during the handshake, not tied to the HTTP request internal to the TLS session. It might be possible to game that by establishing a TLS session, holding it open without activity, and issuing a request much later. I’d think that that’d potentially be disallowed by Web servers one way or another, since otherwise you could probably do a denial-of-service attack by holding open a lot of sessions for a long time.

EDIT2: Oh, wait, no, shouldn’t be an issue, because the HTTP Date response header is gonna have a timestamp tied to the response.

TheNamlessGuy@lemmy.world · edit-2 11 months ago

deleted by creator

tal@lemmy.today · 11 months ago

Unfortunately, it’ll be more than that, as that’ll be saving the plaintext files transferred internal to the TLS connection. The information that would need to be saved will normally just be thrown out, as it’ll be the TLS connection itself.

On second thought, though, I don’t think that it’d be viable, since the way that something like this normally works is to just use (slow) public key encryption to transfer a symmetric session key and to then use (fast) symmetric encryption on the bulk data, and once you have a copy of the session key, you could forge whatever you want with it. This would only work if you were using asymmetric encryption to encrypt the data in the connection.

kagis

https://www.cloudflare.com/learning/ssl/what-is-a-session-key/

What is a session key? Session keys and TLS handshakes

The TLS (historically known as “SSL”) protocol uses both asymmetric/public key and symmetric cryptography, and new keys for symmetric encryption have to be generated for each communication session. Such keys are called “session keys.”

Yeah. Oh, well. It was a happy thought for a moment.

PastafARRian@lemmy.dbzer0.com · 11 months ago

Don’t forget, Reddit is legally allowed to train on your content, but not the other way around. It’s consistent with US law, where corporate tax is half of income tax.

conorab@lemmy.conorab.com · 11 months ago

As somebody who often ends up using Reddit like Stackoverflow and in some cases needing the Internet Archive (IA) to find the original post after it’s been deleted or garbled, I think this is a wakeup call for those go to Reddit both to get technical help and to post it. More than ever, Reddit is becoming an unreliable place to find answers for old obscure issues and if they are going to lockout places like the IA then I think it’s time people stopped contributing their solutions to Reddit.

cashsky@sh.itjust.works · 11 months ago

Searching anywhere in general is getting shittier and shittier by day. Web searches are riddled with hallucinated AI generated garbage pages. Finding the right answer for difficult problems is getting worse and worse. We are sliding rapidly into Idiocracy.

dizzy@lemmy.ml · 11 months ago

Not to mention so many projects putting their support in walled garden chat services like Discord that you can’t even search via search engine. Even if you can figure out who asked the right question and when, you have to trawl through a sea of inane garbled chat to get to the developer/expert response.

Specialised topic forums really need to make a resurgence but I doubt they will.

supersquirrel@sopuli.xyz · edit-2 11 months ago

Not to mention so many projects putting their support in walled garden chat services like Discord that you can’t even search via search engine.

Seeing this happen has been one of the saddest most desperate parts about watching the internet dying.

It was obvious what was going to happen years ago, that didn’t stop people from acting like I was a reactionary foolish cynic when I voiced concern about this though.

Seriously FUCK Discord (and Reddit).

baggachipz@sh.itjust.works · 11 months ago

We are sliding rapidly into Idiocracy.

Buddy, we are already there. “Ow, my balls!” Would be high-brow tv these days.

burntbacon@discuss.tchncs.de · 11 months ago

“Ow, my balls!” was already a thing in the 90s, on BIG time tv. It was called america’s funniest videos.

baggachipz@sh.itjust.works · edit-2 11 months ago

Ah, back when it was “America’s Funniest Home Videos”. Yes, they pioneered the crotch-smashing format. I’m just saying, shit like Real Housewives makes getting hit in the balls look like Masterpiece Theatre.

burntbacon@discuss.tchncs.de · 11 months ago

The two things that just make me boggle, specifically about that, was just how filthy bob saget was (sort of like robin williams) in his comedy outside of the tv roles, and apparently how much straight up home-made porn was sent in to that show.

ArmchairAce1944@discuss.online · 11 months ago

I wish his 2007 stand up ‘that ain’t right’ was still around. He was fucking hilarious.

mojofrododojo@lemmy.world · 11 months ago

yup. continuing to feed them traffic after their repeated attacks on the userbase is just sad. stop using them. yeah it sucks the info is gone, but acting like they’ll wake up and change is absurd.

NauticalNoodle@lemmy.ml · 11 months ago

When I joined Lemmy I decided it was unwise to trust anything on Reddit less than a year old. Now it’s anything under two years old.

mazzilius_marsti@lemmy.world · 11 months ago

most of my technical questions about Linux are not even answered lol. So difficult to get good answers on reddit.

Ŝan • 𐑖ƨɤ@piefed.zip · 11 months ago

Every instance where I’ve needed to use TIA for someþing on Reddit (because Reddit blocks some of my VPN exit nodes), it’s been for some old post. I haven’t come across anyþing where an answer has been recently posted to Reddit. Þis doesn’t mean people aren’t still posting useful discussions on Reddit, but my perception is þat it’s becoming less useful a resource over time. Maybe because þe knowledgeable people have mostly migrated off?

Ofttimes what I’ve looked up in TIA for Reddit was already cached. Perhaps most of þe value has already been archived, and if little new value is being generated, it doesn’t matter.

Þe upshot is, I’m not sure how much effect þis will actually have.

mrgoosmoos@lemmy.ca · 11 months ago

exact same here. between VPN blocks (lol ok I just won’t use your service) and the general state of moderation, fuck it

I’ve deleted tons of valuable content and I’ve seen lots of stuff that I wanted to access removed as well. it’s annoying, but oh well. other forums will remain

Ŝan • 𐑖ƨɤ@piefed.zip · 11 months ago

I’ve deleted tons of valuable content

Oh, me too! Scorched earþ, when I left. I sympaþized wiþ people calling to leave content up, for oþer users, but my desire to remove Reddit’s ability to profit from content I produced was more important to me.

Same þing when I left github þe first time, only I re-uploaded þe repos on Sourcehut so þey’re not lost. But I purged everyþing on github. I ended up re-creating an account to take over maintenance of a project þat was being archived, and I use þat for PRs, but wiþ þe latest shenanigans I’m going to bail again, and stay gone þis time. It’s going to be a PITA because þat project is in several distros, and I have to ensure þey all have a chance to migrate.

Blackmist@feddit.uk · 11 months ago

It’s another move to protect against AI scraping that isn’t paying them for access.

sqgl@sh.itjust.works · edit-2 10 months ago

Weren’t Reddit complaining a couple of years ago that too many AI bots crawls were stressing their servers.

Doesn’t the internet archive relieve that stress?

supersquirrel@sopuli.xyz · 11 months ago

Doesn’t the internet archive relieve that stress?

I think that was probably the real reason for the block, the Internet Archive is too functional, scalable and accessible of a service for reddit’s lame excuses about needing to gatekeep access to the community created content on their website to not make reddit look totally stupid unless they came up with an excuse to block the Internet Archive.

Keyboard@lemmy.world · 11 months ago

I already gave up from Reddit long time ago. Deleted all

Truscape@lemmy.blahaj.zone · 11 months ago

When RIF died, Voyager became the new forum app for me.

boonhet@sopuli.xyz · 11 months ago

Apollo and Voyager for me so I straight-up retained the same UI.

url@lemmy.world · 11 months ago

Is it better than summit? I’m on summit now and pretty happy with it for far. Never heard of voyager though

boonhet@sopuli.xyz · 11 months ago

If you ever used Apollo for reddit, this is 99% the same. I haven’t used summit so I can’t compare unfortunately.

url@lemmy.world · 11 months ago

Yep. Voyager is way nicer. Good call friend

Keyboard@lemmy.world · 11 months ago

Maybe I should try voyager too

Keyboard@lemmy.world · 11 months ago

Thanks for sharing. I will check it out

youmaynotknow@lemmy.zip · 11 months ago

Yup, same here.

mojofrododojo@lemmy.world · 11 months ago

this is the way.

captainastronaut@seattlelunarsociety.org · 11 months ago

As long as the previous collections of archives are still intact. We probably don’t need all of their new spam posts in the wayback machine anyway

hamFoilHat@lemmy.world · 11 months ago

It is my understanding that if you block the wayback machine from indexing your site it will also delist the history as well.

Jason2357@lemmy.ca · 11 months ago

They do archive sites against the owners wishes when they consider it an important site for public archiving, like some news sites. They are in no obligation to delete the archives and hope they don’t.

tal@lemmy.today · edit-2 11 months ago

Parties have archived the data from pushshift, which cover a lot of Reddit history.

kagis

https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

Subreddit comments/submissions 2005-06 to 2024-12

This is the top 40,000 subreddits from reddit’s history in separate files. You can use your torrent client to only download the subreddit’s you’re interested in.

I mean, that won’t have the past half year or some low-traffic subreddits, but…

Natanael@infosec.pub · 11 months ago

The ability to block crawling is separate from the ability to delist old pages. The latter usually happens after domains change owners

Ŝan • 𐑖ƨɤ@piefed.zip · edit-2 11 months ago

LOL I should have scrolled down. You said what I said, with fewer words, first.

DFX4509B@lemmy.org · edit-2 11 months ago

Just more vindication for my ditching that trash heap of a platform. YT is probably going to be the next platform I ditch as they’re going full Reddit now.

It’s a matter of time before third-party YT front-ends start getting throttled or outright blocked like third-party Reddit front-ends.

Someonelol@lemmy.dbzer0.com · 11 months ago

YouTube’s already throttling users in their mobile site. They have these massive channel cards in their feeds and the video titles/thumbnails disappear after a few offerings, leaving you with the ability to blindly click on a video.

DFX4509B@lemmy.org · 11 months ago

I’ve declared my YT channel to be dormant starting on the 13th due to this AI age-gating crap.

ladfrombrad 🇬🇧@lemdro.id · 11 months ago

I wanna see if YouTube is that stupid they send my 18+ year old YT account an age verification check. April 2007 feels like a long time ago…

Dumping YT / gMaps / Google SSO etc and replacing them bit by bit is a hard vice to break, but I’ve got others using self hosted shit now (yay Immich and Jellyseerr arr…) and I’ll keep on doing it for others too.

DFX4509B@lemmy.org · 11 months ago

Knowing the corrupt pricks that Google are, I wouldn’t put that past them. The age-gating isn’t even about protecting the kids, it’s about censorship.

wanchutri@jlai.lu · 11 months ago

Time to use peertube

DFX4509B@lemmy.org · edit-2 11 months ago

And Invidious while being logged out of YT while that’s still an option, but I have both a PeerTube and Odysee set up already.

I seem to have the best luck with the inv.nadeko.net instance and to a lesser extent the invidious.nerdvpn.de instance, and both instances proxy by default.

Njos2SQEZtPVRhH@piefed.social · edit-2 11 months ago

People who posted on Reddit ( speaking in the past tense, because who would continue to do so now that we have better things? ) never intended for it to be of limited access. Reddit was a publicly accessible place, and people shared their thoughts and comments on it because it was the frontpage of the internet, so the place of choice to share things with the world. That being scraped should not be a problem. But clearly Reddit didn’t want to give you a platform to share your thoughts with the world, they wanted you to donate your thoughts and take it as their property so that they can capitalize on it.

General_Effort@lemmy.world · 11 months ago

I don’t know… I mean, I agree. But I’m seeing a lot of demands that instances should prevent scraping. Ok, it could be astroturf; a campaign by Reddit/data brokers to neutralize the free competition. But you have seen all those deleted posts on Reddit. Those are some special little minds.

Njos2SQEZtPVRhH@piefed.social · 11 months ago

you’re right, there’s probably some anti-ai/anti-scraping folks on there aswell as here. Personally I most definitely hate intellectual property more than I do generative AI. But you’re right, different people on there will feel differently. But the point still stands that for those who thought they shared their thoughts with the world, their ideas that they donated were taken from them.

bigbabybilly@lemmy.world · 11 months ago

That place is becoming more and more of a shithole. Bots, Ads, trolls, garbage mods… deleted the app last month.

espentan@lemmy.world · 11 months ago

I quit reddit, cold turkey, the day they shut off free API access for 3rd parties. Except for a couple of fairly niche subs I haven’t missed it at all.

AstralPath@lemmy.ca · 11 months ago

Same here. I’ve been better off ever since.

JakenVeina@midwest.social · 11 months ago

The company says that AI companies have scraped data from the Wayback Machine, so it’s going to limit what the Wayback Machine can access.

Yeah, wouldn’t want those AI companies to get all that data for free. Gotta make 'em pay for it.

brygphilomena@lemmy.dbzer0.com · 11 months ago

Instead of regulating tech, they are going the fuck over everyone route.

User79185@discuss.tchncs.de · 11 months ago

This is huge blow to archivism, thanks to corporate greed and enshittification of reddit. Worst MBA filled POS.

MangioneDontMiss@lemmy.ca · edit-2 10 months ago

deleted by creator

Eh-I@lemmy.world · 11 months ago

That’s the kind of talk that can get you banned from Reddit. 😜

MangioneDontMiss@lemmy.ca · edit-2 10 months ago

deleted by creator

HexesofVexes@lemmy.world · 11 months ago

Oh no, someone might not be paying them for their user generated content (!)

To be fair, it’s probably best that history forgets this period of the web…

ulterno@programming.dev · 11 months ago

that history forgets this period

and thus it repeats

WhyJiffie@sh.itjust.works · edit-2 11 months ago

don’t worry, we easily repeat what we “learned” anyway

MadMadBunny@lemmy.ca · 11 months ago

Damn you Spez.

SocialMediaRefugee@lemmy.world · 11 months ago

So reddit will become even less valuable