GPT-5 (Generic)
OpenAI GPT-5
Rank #20 overall · 41 evaluations
7.72
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Human Likeness
Readability
Style/Tone
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 8.37
Semantic Consistency: 8.17
Areas for Improvement
Ensemble Agreement: 6.37
Factual Accuracy: 7.21
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Grok 4 (Reasoning) | 27 | 5 | 4 | -0.26 |
| Claude 3.7 Sonnet | 21 | 5 | 5 | -0.52 |
| Sonar Pro | 19 | 7 | 5 | -0.19 |
| GPT-5 | 0 | 0 | 4 | +0.04 |