GPT-5.2
OpenAI GPT-5
Rank #5 overall · 62 evaluations
8.71
Performance Metrics
Metric Breakdown
Relevance
Style/Tone
Semantic Consistency
Human Likeness
Factual Accuracy
Readability
Ensemble Agreement
Strengths
Relevance: 9.48
Style/Tone: 9.02
Areas for Improvement
Ensemble Agreement: 7.45
Readability: 8.50
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 13 | 1 | 20 | +0.47 |
| Gemini 3 Flash | 4 | 2 | 13 | -0.42 |
| Grok 4.1 (Reasoning) | 9 | 0 | 7 | +0.60 |
| Gemini 3 Pro | 7 | 0 | 5 | +1.14 |
| Grok 3 Mini | 1 | 2 | 6 | +0.65 |
| GPT-5.2 (Thinking) | 1 | 1 | 1 | -2.88 |