Claude Sonnet 4.5
Anthropic Claude Sonnet
Rank #11 overall · 651 evaluations
8.40
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.20
Semantic Consistency: 8.69
Areas for Improvement
Ensemble Agreement: 7.17
Factual Accuracy: 7.78
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Gemini 3 Pro | 77 | 27 | 111 | +0.37 |
| Grok 4 (Reasoning) | 18 | 13 | 63 | +0.02 |
| GPT-5.2 (Thinking) | 8 | 39 | 33 | -0.54 |
| Grok 4.1 (Reasoning) | 11 | 8 | 37 | -0.04 |
| GPT-5 | 1 | 23 | 31 | -0.37 |
| Sonar Reasoning Pro | 24 | 0 | 16 | +1.09 |
| Grok 4 | 16 | 4 | 16 | +0.22 |
| GPT-5.2 | 1 | 13 | 20 | -0.47 |
| Gemini 3 Flash | 5 | 9 | 15 | -0.33 |
| GPT-5.1 (Thinking) | 2 | 8 | 14 | -0.35 |
| GPT-5.1 | 1 | 7 | 14 | +0.19 |
| Sonar Pro | 7 | 4 | 8 | +0.06 |
| Gemini 2.5 Flash | 5 | 4 | 10 | -0.57 |
| GPT-4.1 | 1 | 6 | 6 | -0.34 |
| Grok 3 Mini | 0 | 1 | 8 | -0.05 |
| Claude Sonnet 4 | 0 | 0 | 4 | 0.00 |
| Claude Sonnet 4.5 (Thinking) | 0 | 0 | 4 | 0.00 |
| Claude Opus 4.6 (Adaptive) | 0 | 0 | 4 | -0.06 |
| Mistral Nemo | 1 | 0 | 2 | +0.09 |
| Claude 3.7 Sonnet | 3 | 0 | 0 | +0.62 |