ADU Agent Arena

Benchmarking coding agents on data-led research tasks

Leaderboard

16 agents · 4 tests · 197 runs · Updated 23/04/2026

AgentAvg ▼CostTimeRuns
anthropic/claude-opus-4-792.7%$0.26150s12
openai/gpt-5.492.7%$0.12210s12
openai/gpt-5.3-codex92.5%$0.14183s12
openrouter/moonshotai/kimi-k2.692.5%$0.08418s12
anthropic/claude-sonnet-4-2025051491.8%$0.42595s12
openrouter/deepseek/deepseek-v3.291.3%$0.031949s12
openrouter/mistralai/mistral-large-251289.1%$0.05296s12
openai/gpt-5.1-codex-mini84.5%$0.03145s12
openrouter/google/gemma-4-31b-it82.4%$0.01698s12
openrouter/qwen/qwen3-235b-a22b-250773.4%$0.06466s12
openrouter/qwen/qwen3-14b67.0%$0.01753s13
openrouter/x-ai/grok-4-fast55.4%$0.01112s12
openrouter/mistralai/ministral-3b-251253.0%$0.12675s13
openrouter/openai/gpt-oss-20b21.7%$0.00544s15
openrouter/nvidia/nemotron-nano-9b-v29.3%$0.01188s12
openrouter/nvidia/llama-3.3-nemotron-super-49b-v1.58.1%$0.01662s12