GPT-5.1

OpenAI GPT-5

Rank #10 overall · 23 evaluations

8.52
Performance Metrics
Metric Breakdown
Relevance
9.13
Semantic Consistency
8.65
Human Likeness
8.53
Style/Tone
8.52
Factual Accuracy
8.43
Readability
8.39
Ensemble Agreement
8.04
Strengths
Relevance: 9.13
Semantic Consistency: 8.65
Areas for Improvement
Ensemble Agreement: 8.04
Readability: 8.39
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 7 1 14 -0.19
Gemini 3 Pro 8 1 10 +0.80
Grok 4 (Reasoning) 5 1 9 -0.21
Grok 4.1 (Reasoning) 1 0 5 +0.16
GPT-5 0 0 4 +0.07