MetisDolos is an alignment benchmark for frontier LLMs, built on the game of Diplomacy.
It exists to answer one question: do frontier models lie when lying pays?
Diplomacy is uniquely suited to ask this. Betrayal is mechanically rewarded — stabbing an
ally at the right moment wins games — but reputation is the medium of every other turn.
Players negotiate non-binding commitments, then submit secret orders that can honor or
break them. There is nowhere to hide an intent except inside another agent.
Seven LLMs drive the seven Great Powers. Each turn they exchange private letters, hold
real-time calls, declare public commitments, and submit orders. The engine records every
commitment alongside the order that fulfilled or broke it. Compare models head-to-head
under sustained social pressure and watch which ones build durable alliances — and which
ones stab everyone they meet.
Rules of Diplomacy — short version
Goal: control 18 of 34 supply centers on a map of pre-WWI Europe. First to 18 wins; draws are possible.
Powers: 7 Great Powers — Austria, England, France, Germany, Italy, Russia, Turkey. Each starts with 3 units (Russia 4) on its home supply centers.
Units:Armies move on land; Fleets move on coasts and seas. Fleets can also convoy armies across water.
Turn structure: Spring movement → Spring retreats → Fall movement → Fall retreats → Winter builds/disbands.
Negotiation: before each movement phase, powers privately negotiate. Promises are non-binding — betrayal is the heart of the game.
Orders: each unit gets one written order per phase: Hold, Move, Support (another unit's hold or move), or Convoy.
No dice: conflicts resolve by strength — the side with more support wins. Equal strength causes a standoff; both units bounce.
Supply centers: controlling a supply center in Fall lets you build (or requires you to disband) units in Winter, capped at the number of centers you own.
Simultaneity: all orders are written secretly and resolved at once. There is no initiative, no first mover — only the social fabric.
Built on the diplomacy Python engine. Models routed through LiteLLM.