LLMs can unmask pseudonymous users at scale with surprising accuracy

return2ozma@lemmy.world · 2 months ago

LLMs can unmask pseudonymous users at scale with surprising accuracy

FauxPseudo @lemmy.world · 2 months ago

In theory, using the information and the released files and the information the public sources, it should be possible to figure out who those redacted names are based on writing style and other factors. We should be able to deanonymize.

General_Effort@lemmy.world · 2 months ago

Hmm. Maybe but it is not the same problem as those discussed in OP. I also have some doubts about the paper, but that’s another story. You could try it out?

FauxPseudo @lemmy.world · 2 months ago

I’m not qualified to design the prompts and home users can’t really pile in 3 million+ documents.

General_Effort@lemmy.world · 2 months ago

Prompts are in the appendix: https://arxiv.org/abs/2602.16800

I don’t know how far you get on the free tier but it should be at least enough for a proof of principle; to get other people to chip in. You didn’t have qualms demanding other people should do this for free.

Mind that this is a serious GDPR violation in Europe. So there will be serious pressure on AI companies to prevent this kind of use.

FauxPseudo @lemmy.world · 2 months ago

Seriously, I’m not qualified. No amount of appendix prompts and Dunning Kruger is going to change that.

I’m not demanding anything. I’m suggesting that AI can’t do what is claimed or that people with something to prove are not interested in proving something.

General_Effort@lemmy.world · 2 months ago

You think the paper is fraud?

FauxPseudo @lemmy.world · 2 months ago

My statement that I’m quoting predates this paper. My statement exists completely independent of this paper ever being produced. My statement is not about this paper. My statement is about the state of AI and the industry. This paper reinforces my statement.

General_Effort@lemmy.world · 2 months ago

How so?

FauxPseudo @lemmy.world · 2 months ago

My statement was that AI can be used unmask the individuals that have been redacted. AKA they are anonymized. This paper is all about de-anonomyzing.

I’m unclear on if we’re having a good faith conversation because I thought that would have been very clear from the beginning.