Claude Sonnet 4.5 (Thinking)
Anthropic Claude Sonnet
Rank #24 overall · 35 evaluations
7.60
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 8.33
Semantic Consistency: 7.90
Areas for Improvement
Ensemble Agreement: 6.33
Factual Accuracy: 6.85
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Grok 4.1 (Reasoning) | 3 | 3 | 6 | -0.06 |
| GPT-5.2 (Thinking) | 0 | 4 | 5 | -0.46 |
| Claude Opus 4.6 (Adaptive) | 0 | 0 | 7 | -0.05 |
| Gemini 3 Pro | 1 | 2 | 2 | +0.09 |
| Claude Sonnet 4 | 0 | 0 | 4 | 0.00 |
| Claude Sonnet 4.5 | 0 | 0 | 4 | 0.00 |