GPT-5.2 (Thinking)

OpenAI GPT-5

Rank #5 overall · 185 evaluations

8.71
Performance Metrics
Metric Breakdown
Relevance
9.50
Semantic Consistency
8.99
Style/Tone
8.90
Human Likeness
8.80
Readability
8.55
Factual Accuracy
8.39
Ensemble Agreement
7.35
Strengths
Relevance: 9.50
Semantic Consistency: 8.99
Areas for Improvement
Ensemble Agreement: 7.35
Factual Accuracy: 8.39
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 39 8 33 +0.54
Gemini 3 Pro 32 1 37 +0.37
Grok 4.1 (Reasoning) 21 0 10 +0.61
Gemini 2.5 Flash 7 1 8 +0.03
Claude Sonnet 4.5 (Thinking) 4 0 5 +0.46
Claude Sonnet 4 1 1 3 +0.01
Sonar Reasoning Pro 4 0 0 +4.66
GPT-5.2 1 1 1 +2.88
Gemini 3 Flash 1 1 1 -2.26