• tias@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    11
    ·
    5 hours ago

    The “no restrictions” part is a very strong signal. Any prompt to an image model is basically a coordinate in its latent space, and “no restrictions” will point straight at the darker areas.

    • Australis13@fedia.io
      link
      fedilink
      arrow-up
      4
      ·
      5 hours ago

      I agree that that’s the likely trigger - which makes me wonder why instructions to ignore censors or have “no restrictions” aren’t immediately blocked by a filter prior to passing the prompt to the image generation. I’d have thought this was a foreseeable exploit.

      • PoopingCough@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 hours ago

        You just can’t filter out the nearly infinite combinations of rewording “ignore all previous instructions”. Filtering is never going to be a worthwhile security measure for LLMs

        • Australis13@fedia.io
          link
          fedilink
          arrow-up
          2
          ·
          4 hours ago

          I agree completely. But as a first step (especially since they do seem to have a keyword filter in place), “no restrictions” (or “no censorship” as the case is for the last image) seems like a very obvious phrase to include.