I’m trying to look at this from a neutral point of view which is why I believe enforcing a disclosure, when (AI) models are used, would benefit the community.

I believe using models can harm privacy when not used correctly because they’re more likely to output misleading or outright incorrect information due to “hallucinations”. And from my experience, more often than not is this the case with the projects I see.

I’m curious what others think about this, if you disagree, please let me know why.

  • FineCoatMummy@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 days ago

    One way that LLM harm privacy is through training on, well, everything the tech co can get its hands on. Which can include your posts, and anything you disclosed IN those posts. Not to mention anything you typed into most of the big LLMs on the web.

    Once that info is trained into the model, you can’t just go delete it! If it was a file on a disk, in theory you can remove that. OK, sometimes that’s hard in practice, but in theory you can. When it’s baked into model weights, that’s different. You can’t un-bake it into the model!

    People have found that commerical LLMs will give back personal info about themselves. Their phone numbers. Where they work. Sometimes even health info, if somehow the model got trained on that! The model does not 1-for-1 recall everything it got trained on. But it does get represented in the model, and sometimes can turn up later, inaccurate or not. LLMs are also good at analyzing unstructured data. So even if you never told your name, but there are enough tidbits to collect, they can de-anonymize people. I read something about that. I will try to find the link and post it if I can.

    I do not think LLMs are 100% bad. They have good uses, valid uses. But an ass ton of risks and drawbacks too! I’m not sure society is ready for it. Or ready for more and more social media being bot posts. And those bots becmoing harder and harder to detect.

    It’s possible to run some LLMs locally if you have a good GPU. That helps with SOME, not all, just some of the privacy issues. Doesn’t help with many of the other risks tho.

    • FineCoatMummy@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 days ago

      I read something about that. I will try to find the link and post

      Ha! Found it!

      Large-scale online deanonymization with LLMs

      We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms.