As a user of the website for many years it’s entirely obvious that a majority of the website is gamed (besides its niche corners). Reddit’s bot prevention mechanisms just fuck VPN users - they are wholly inadequate.
That thread is evidence of a organized bot campaign that they had 12 days to clean up (and didn’t). It’s naive to believe that the rest of the website isn’t similarly (and less obviously) affected by bots - with vote manipulation still standing.
Meta/OpenAI openly pirating everything they can to train their LLMs is a good example of how data hungry these AI/etc. companies are.
Is it plausible that companies request that Reddit narrows down data e.g. by demographic, geographic location, or likelihood of being a real person and request that data for purchase? Sure, but the LLMs seemingly require all data that exists that these companies can get their hands on - I highly doubt with the scale of data being consumed (and data theft being committed) that the big players care too much about Reddit data being tainted. If anything, it might even be desirable to them.
Are you somebody invested financially in Reddit? Genuine question.
Those niche subreddits can also have their moments, too. Maybe it’s not bots, but there are plenty of shills that have been caught in various niche subreddits I’ve frequented over the years (thanks to unpaid moderators).
No, I’m not. I don’t care at all if they’re successful or go under.
Sure, but again it’s not likely to be most. You don’t seem to realize how hard it is to get data that is already classified. That stuff is gold to people developing AI. Most of the work in data science is cleaning data and getting it into a usable form.
It’s noise, a very large part of it. Reddit is financially motivated to make the data appear as if it is signal. It isn’t - they have taken extremely minimal steps to ensure actual human participation.
This doesn’t matter to AI companies, but it only warps that technology more and more. AI is a sinking ship with current methodologies. Reddit will die when the AI bubble bursts and those involved with Reddit already cashed out enough to be filthy rich.
If you can land me a gig engaging with back end data from Reddit in a neutral capacity, it’d likely be pretty easy for a layman like me to confirm that it’s largely noise. The AI companies buying data are getting scammed and you are free to remain neutral or plainly disagree with my assessment in the absence of concrete data that is publicly obtainable.
No company is immune to bots and inorganic engagement, least of all Reddit with the strategies employed.
There are absolutely vote manipulation campaigns still happening.
See this from 12 days ago: https://old.reddit.com/r/privacy/comments/1nldj4m/why_are_we_all_just_accepting_metas_new_spy/?sort=old
12 days ago almost every top-level comment was below -50 and responses were untouched.
Of course there are. That doesn’t mean the majority of the site is compromised.
As a user of the website for many years it’s entirely obvious that a majority of the website is gamed (besides its niche corners). Reddit’s bot prevention mechanisms just fuck VPN users - they are wholly inadequate.
That thread is evidence of a organized bot campaign that they had 12 days to clean up (and didn’t). It’s naive to believe that the rest of the website isn’t similarly (and less obviously) affected by bots - with vote manipulation still standing.
They could sell the cleaned votes to AI companies and keep the dirty data public for the scrapers.
Meta/OpenAI openly pirating everything they can to train their LLMs is a good example of how data hungry these AI/etc. companies are.
Is it plausible that companies request that Reddit narrows down data e.g. by demographic, geographic location, or likelihood of being a real person and request that data for purchase? Sure, but the LLMs seemingly require all data that exists that these companies can get their hands on - I highly doubt with the scale of data being consumed (and data theft being committed) that the big players care too much about Reddit data being tainted. If anything, it might even be desirable to them.
Okay, but it is those niche subs that are the most valuable.
Are you somebody invested financially in Reddit? Genuine question.
Those niche subreddits can also have their moments, too. Maybe it’s not bots, but there are plenty of shills that have been caught in various niche subreddits I’ve frequented over the years (thanks to unpaid moderators).
No, I’m not. I don’t care at all if they’re successful or go under.
Sure, but again it’s not likely to be most. You don’t seem to realize how hard it is to get data that is already classified. That stuff is gold to people developing AI. Most of the work in data science is cleaning data and getting it into a usable form.
It’s noise, a very large part of it. Reddit is financially motivated to make the data appear as if it is signal. It isn’t - they have taken extremely minimal steps to ensure actual human participation.
This doesn’t matter to AI companies, but it only warps that technology more and more. AI is a sinking ship with current methodologies. Reddit will die when the AI bubble bursts and those involved with Reddit already cashed out enough to be filthy rich.
At this point we’re just speculating. We don’t have evidence either way of its mostly good or mostly bad data.
If you can land me a gig engaging with back end data from Reddit in a neutral capacity, it’d likely be pretty easy for a layman like me to confirm that it’s largely noise. The AI companies buying data are getting scammed and you are free to remain neutral or plainly disagree with my assessment in the absence of concrete data that is publicly obtainable.
No company is immune to bots and inorganic engagement, least of all Reddit with the strategies employed.