Someone has done this, and deepseek-r1 has an elo of 1425 compared to grok-3-mini-beta’s (the highest ranked version of grok) elo of 1359. Granted, these rankings should be taken with a grain of salt because they have certain things that don’t quite make sense, like ranking Gemini above all other models and having Claude way too low.
Someone has done this, and deepseek-r1 has an elo of 1425 compared to grok-3-mini-beta’s (the highest ranked version of grok) elo of 1359. Granted, these rankings should be taken with a grain of salt because they have certain things that don’t quite make sense, like ranking Gemini above all other models and having Claude way too low.