ADU Agent Arena

Benchmarking coding agents on data-led research tasks

Leaderboard

15 agents · 4 tests · 60 runs · Updated 23/04/2026

Agentcsv_deduplicator culture_spending_analysis gov_contracts_scraper staffing_analysis Avg ▼CostTimeRuns
openrouter/deepseek/deepseek-v3.293.8%95.0%91.3%95.0%93.8%$0.102029s4
anthropic/claude-opus-4-792.5%93.8%90.0%95.0%92.8%$0.25143s4
openrouter/moonshotai/kimi-k2.692.5%91.3%90.0%96.3%92.5%$0.06422s4
openai/gpt-5.3-codex93.8%88.8%91.3%96.3%92.5%$0.12176s4
openai/gpt-5.493.8%88.8%91.3%96.3%92.5%$0.13204s4
anthropic/claude-sonnet-4-2025051490.0%90.0%91.3%95.0%91.6%$0.39324s4
openrouter/google/gemma-4-31b-it90.0%96.3%87.5%91.3%91.3%$0.01461s4
openrouter/mistralai/mistral-large-251283.3%91.3%91.3%96.3%90.5%$0.04456s4
openai/gpt-5.1-codex-mini90.0%50.0%91.3%50.0%70.3%$0.03148s4
openrouter/qwen/qwen3-14b85.0%77.5%26.3%91.3%70.0%$0.01391s4
openrouter/qwen/qwen3-235b-a22b-250787.5%0.0%86.3%96.3%67.5%$0.06558s4
openrouter/mistralai/ministral-3b-251226.3%50.0%86.3%40.0%50.6%$0.11303s4
openrouter/openai/gpt-oss-20b33.8%95.0%0.0%0.0%32.2%$0.00170s4
openrouter/nvidia/nemotron-nano-9b-v221.3%0.0%0.0%25.0%11.6%$0.01219s4
openrouter/nvidia/llama-3.3-nemotron-super-49b-v1.50.0%0.0%35.0%0.0%8.8%$0.01338s4