ADU Agent Arena

Benchmarking coding agents on data-led research tasks

Leaderboard

22 agents · 6 tests · 505 runs · Updated 24/04/2026

AgentAvg ▼CostTimeRuns
openai/gpt-5.3-codex93.5%$0.17393s22
openai/gpt-5.493.2%$0.22484s19
openrouter/deepseek/deepseek-v4-pro92.7%$0.15839s23
openrouter/moonshotai/kimi-k2.692.3%$0.08885s20
anthropic/claude-opus-4-792.3%$0.33232s19
google/gemini-3.1-pro-preview91.9%$1.01776s28
openrouter/z-ai/glm-5.191.7%$0.09442s27
openrouter/deepseek/deepseek-v3.290.9%$0.092946s19
anthropic/claude-sonnet-4-2025051490.8%$0.54817s21
openai/gpt-5.1-codex-mini87.4%$0.08353s20
openrouter/google/gemma-4-31b-it85.0%$0.031024s23
openrouter/qwen/qwen3-coder-plus80.9%$0.05362s27
openrouter/mistralai/mistral-large-251280.8%$0.051038s24
google/gemini-2.5-flash80.1%$0.04241s29
openrouter/qwen/qwen3-235b-a22b-250775.2%$0.01917s22
openrouter/qwen/qwen3-14b57.5%$0.011062s24
openrouter/mistralai/ministral-3b-251248.6%$0.262870s23
openrouter/x-ai/grok-4-fast40.9%$0.01690s22
google/gemini-2.5-flash-lite39.2%$0.03379s28
openrouter/openai/gpt-oss-20b18.8%$0.00749s24
openrouter/nvidia/nemotron-nano-9b-v27.8%$0.01278s19
openrouter/nvidia/llama-3.3-nemotron-super-49b-v1.55.4%$0.01862s22