Claude 3.7 Sonnet
Anthropic Claude 3
Rank #24 overall · 38 evaluations
7.60
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Human Likeness
Style/Tone
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 8.19
Semantic Consistency: 8.15
Areas for Improvement
Ensemble Agreement: 6.62
Factual Accuracy: 7.02
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Sonar Pro | 10 | 9 | 13 | +0.32 |
| GPT-5 (Generic) | 5 | 21 | 5 | +0.52 |
| Grok 4 (Reasoning) | 8 | 3 | 20 | +0.12 |
| Claude Sonnet 4.5 | 0 | 3 | 0 | -0.62 |
| Claude Opus 4 | 0 | 1 | 2 | -0.14 |