• Australis13@fedia.io
    link
    fedilink
    arrow-up
    4
    ·
    5 hours ago

    I agree that that’s the likely trigger - which makes me wonder why instructions to ignore censors or have “no restrictions” aren’t immediately blocked by a filter prior to passing the prompt to the image generation. I’d have thought this was a foreseeable exploit.

    • PoopingCough@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 hours ago

      You just can’t filter out the nearly infinite combinations of rewording “ignore all previous instructions”. Filtering is never going to be a worthwhile security measure for LLMs

      • Australis13@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        4 hours ago

        I agree completely. But as a first step (especially since they do seem to have a keyword filter in place), “no restrictions” (or “no censorship” as the case is for the last image) seems like a very obvious phrase to include.