GPT-5.2 (Thinking)
OpenAI GPT-5
Rank #5 overall · 185 evaluations
8.71
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.50
Semantic Consistency: 8.99
Areas for Improvement
Ensemble Agreement: 7.35
Factual Accuracy: 8.39
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 39 | 8 | 33 | +0.54 |
| Gemini 3 Pro | 32 | 1 | 37 | +0.37 |
| Grok 4.1 (Reasoning) | 21 | 0 | 10 | +0.61 |
| Gemini 2.5 Flash | 7 | 1 | 8 | +0.03 |
| Claude Sonnet 4.5 (Thinking) | 4 | 0 | 5 | +0.46 |
| Claude Sonnet 4 | 1 | 1 | 3 | +0.01 |
| Sonar Reasoning Pro | 4 | 0 | 0 | +4.66 |
| GPT-5.2 | 1 | 1 | 1 | +2.88 |
| Gemini 3 Flash | 1 | 1 | 1 | -2.26 |