Claude Sonnet 4

Anthropic Claude Sonnet

Rank #28 overall · 10 evaluations

7.05
Performance Metrics
Metric Breakdown
Relevance
7.40
Semantic Consistency
7.25
Style/Tone
7.10
Human Likeness
7.08
Readability
7.03
Ensemble Agreement
6.77
Factual Accuracy
6.61
Strengths
Relevance: 7.40
Semantic Consistency: 7.25
Areas for Improvement
Factual Accuracy: 6.61
Ensemble Agreement: 6.77
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Gemini 3 Pro 2 0 3 +0.37
GPT-5.2 (Thinking) 1 1 3 -0.01
Claude Sonnet 4.5 0 0 4 0.00
Claude Sonnet 4.5 (Thinking) 0 0 4 0.00
Claude Opus 4.6 (Adaptive) 0 0 4 -0.06