GPT-5
OpenAI GPT-5
Rank #2 overall · 60 evaluations
8.83
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Factual Accuracy
Readability
Ensemble Agreement
Strengths
Relevance: 9.44
Semantic Consistency: 9.09
Areas for Improvement
Ensemble Agreement: 7.89
Readability: 8.76
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 23 | 1 | 31 | +0.37 |
| Grok 4 (Reasoning) | 32 | 1 | 21 | +0.34 |
| Sonar Reasoning Pro | 26 | 0 | 4 | +1.13 |
| Gemini 3 Pro | 13 | 1 | 5 | +0.59 |
| GPT-5.1 | 0 | 0 | 4 | -0.07 |
| GPT-5 (Generic) | 0 | 0 | 4 | -0.04 |