Claude Sonnet 4.5 (Thinking)

Anthropic Claude Sonnet

Rank #24 overall · 35 evaluations

7.60
Performance Metrics
Metric Breakdown
Relevance
8.33
Semantic Consistency
7.90
Style/Tone
7.80
Human Likeness
7.58
Readability
7.40
Factual Accuracy
6.85
Ensemble Agreement
6.33
Strengths
Relevance: 8.33
Semantic Consistency: 7.90
Areas for Improvement
Ensemble Agreement: 6.33
Factual Accuracy: 6.85
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Grok 4.1 (Reasoning) 3 3 6 -0.06
GPT-5.2 (Thinking) 0 4 5 -0.46
Claude Opus 4.6 (Adaptive) 0 0 7 -0.05
Gemini 3 Pro 1 2 2 +0.09
Claude Sonnet 4 0 0 4 0.00
Claude Sonnet 4.5 0 0 4 0.00