GPT-5.1 (Thinking)
OpenAI GPT-5
Rank #12 overall · 79 evaluations
8.37
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.10
Semantic Consistency: 8.74
Areas for Improvement
Ensemble Agreement: 7.01
Factual Accuracy: 8.05
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Gemini 3 Pro | 27 | 10 | 22 | -0.11 |
| Claude Sonnet 4.5 | 8 | 2 | 14 | +0.35 |
| Sonar Pro | 16 | 2 | 3 | +0.22 |
| Grok 4.1 (Reasoning) | 7 | 2 | 4 | +0.19 |
| Grok 4 (Reasoning) | 5 | 0 | 7 | +0.33 |