Claude 3.7 Sonnet

Anthropic Claude 3

Rank #24 overall · 38 evaluations

7.60
Performance Metrics
Metric Breakdown
Relevance
8.19
Semantic Consistency
8.15
Human Likeness
7.73
Style/Tone
7.72
Readability
7.70
Factual Accuracy
7.02
Ensemble Agreement
6.62
Strengths
Relevance: 8.19
Semantic Consistency: 8.15
Areas for Improvement
Ensemble Agreement: 6.62
Factual Accuracy: 7.02
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Sonar Pro 10 9 13 +0.32
GPT-5 (Generic) 5 21 5 +0.52
Grok 4 (Reasoning) 8 3 20 +0.12
Claude Sonnet 4.5 0 3 0 -0.62
Claude Opus 4 0 1 2 -0.14