Claude Sonnet 4
Anthropic Claude Sonnet
Rank #28 overall · 10 evaluations
7.05
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Ensemble Agreement
Factual Accuracy
Strengths
Relevance: 7.40
Semantic Consistency: 7.25
Areas for Improvement
Factual Accuracy: 6.61
Ensemble Agreement: 6.77
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Gemini 3 Pro | 2 | 0 | 3 | +0.37 |
| GPT-5.2 (Thinking) | 1 | 1 | 3 | -0.01 |
| Claude Sonnet 4.5 | 0 | 0 | 4 | 0.00 |
| Claude Sonnet 4.5 (Thinking) | 0 | 0 | 4 | 0.00 |
| Claude Opus 4.6 (Adaptive) | 0 | 0 | 4 | -0.06 |