Claude Sonnet 4.5

Anthropic Claude Sonnet

Rank #11 overall · 651 evaluations

8.40
Performance Metrics
Metric Breakdown
Relevance
9.20
Semantic Consistency
8.69
Style/Tone
8.60
Human Likeness
8.55
Readability
8.16
Factual Accuracy
7.78
Ensemble Agreement
7.17
Strengths
Relevance: 9.20
Semantic Consistency: 8.69
Areas for Improvement
Ensemble Agreement: 7.17
Factual Accuracy: 7.78
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Gemini 3 Pro 77 27 111 +0.37
Grok 4 (Reasoning) 18 13 63 +0.02
GPT-5.2 (Thinking) 8 39 33 -0.54
Grok 4.1 (Reasoning) 11 8 37 -0.04
GPT-5 1 23 31 -0.37
Sonar Reasoning Pro 24 0 16 +1.09
Grok 4 16 4 16 +0.22
GPT-5.2 1 13 20 -0.47
Gemini 3 Flash 5 9 15 -0.33
GPT-5.1 (Thinking) 2 8 14 -0.35
GPT-5.1 1 7 14 +0.19
Sonar Pro 7 4 8 +0.06
Gemini 2.5 Flash 5 4 10 -0.57
GPT-4.1 1 6 6 -0.34
Grok 3 Mini 0 1 8 -0.05
Claude Sonnet 4 0 0 4 0.00
Claude Sonnet 4.5 (Thinking) 0 0 4 0.00
Claude Opus 4.6 (Adaptive) 0 0 4 -0.06
Mistral Nemo 1 0 2 +0.09
Claude 3.7 Sonnet 3 0 0 +0.62