GPT-5.1 (Thinking)

OpenAI GPT-5

Rank #12 overall · 79 evaluations

8.37
Performance Metrics
Metric Breakdown
Relevance
9.10
Semantic Consistency
8.74
Style/Tone
8.67
Human Likeness
8.51
Readability
8.40
Factual Accuracy
8.05
Ensemble Agreement
7.01
Strengths
Relevance: 9.10
Semantic Consistency: 8.74
Areas for Improvement
Ensemble Agreement: 7.01
Factual Accuracy: 8.05
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Gemini 3 Pro 27 10 22 -0.11
Claude Sonnet 4.5 8 2 14 +0.35
Sonar Pro 16 2 3 +0.22
Grok 4.1 (Reasoning) 7 2 4 +0.19
Grok 4 (Reasoning) 5 0 7 +0.33