GPT-5.1
OpenAI GPT-5
Rank #10 overall · 23 evaluations
8.52
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Human Likeness
Style/Tone
Factual Accuracy
Readability
Ensemble Agreement
Strengths
Relevance: 9.13
Semantic Consistency: 8.65
Areas for Improvement
Ensemble Agreement: 8.04
Readability: 8.39
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 7 | 1 | 14 | -0.19 |
| Gemini 3 Pro | 8 | 1 | 10 | +0.80 |
| Grok 4 (Reasoning) | 5 | 1 | 9 | -0.21 |
| Grok 4.1 (Reasoning) | 1 | 0 | 5 | +0.16 |
| GPT-5 | 0 | 0 | 4 | +0.07 |