Leaderboard
| Agent | csv_deduplicator | culture_spending_analysis | gov_contracts_scraper | staffing_analysis | Avg ▼ | Cost | Time | Runs |
|---|---|---|---|---|---|---|---|---|
| anthropic/claude-opus-4-7 | 65.0% | 80.0% | 57.5% | 77.5% | 70.0% | - | - | 4 |
| openrouter/mistralai/mistral-large-2512 | 55.0% | 80.0% | 55.0% | 82.5% | 68.1% | - | - | 4 |
| openrouter/meta-llama/llama-4-maverick | 55.0% | 80.0% | 55.0% | 77.5% | 66.9% | - | - | 4 |
| openai/gpt-5.4 | 60.0% | 85.0% | 55.0% | 65.0% | 66.3% | - | - | 4 |
| openrouter/qwen/qwen3-coder | 55.0% | 75.0% | 50.0% | 75.0% | 63.7% | - | - | 4 |
| anthropic/claude-sonnet-4-20250514 | 47.5% | 80.0% | 55.0% | 67.5% | 62.5% | - | - | 4 |
| openai/gpt-5.1-codex-mini | 45.0% | 85.0% | 55.0% | 65.0% | 62.5% | - | - | 4 |
| openrouter/deepseek/deepseek-v3.2 | 45.0% | 77.5% | 57.5% | 70.0% | 62.5% | - | - | 4 |