PDF.

Today’s leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act.

Here we present findings from a crisis simulation in which three frontier large language models (GPT-5.2, Claude Sonnet 4, Gemini 3 Flash) play opposing leaders in a nuclear crisis.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    13 hours ago

    Literally two of the three (out of 21) games that ended in full blown nukes on population centers were the result of the study’s mechanic of randomly changing the model’s selection to a more severe one.

    Because it’s a very realistic war game sim where there’s a double digit percentage chance that when you go to threaten using nukes on your opponent’s cities unless there’s a cease to hostilities you’ll accidentally just launch all of them at once.

    This was manufactured to get these kinds of headlines. Even in their model selection they went with Sonnet 4 for Claude despite 4.5 being out before the other models in the study likely as it’s been shown to be the least aligned Claude. And yet Sonnet 4 still never launched nukes on population centers in the games.

    • Brave Little Hitachi Wand@feddit.uk
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      I’ll take that onboard. Still, nothing can convince me anyone should ever talk to an AI about whether to launch nukes. The entire question is insane, so the answers hardly matter.