GPT-5

OpenAI GPT-5

Rank #2 overall · 60 evaluations

8.83
Performance Metrics
Metric Breakdown
Relevance
9.44
Semantic Consistency
9.09
Style/Tone
8.93
Human Likeness
8.86
Factual Accuracy
8.82
Readability
8.76
Ensemble Agreement
7.89
Strengths
Relevance: 9.44
Semantic Consistency: 9.09
Areas for Improvement
Ensemble Agreement: 7.89
Readability: 8.76
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 23 1 31 +0.37
Grok 4 (Reasoning) 32 1 21 +0.34
Sonar Reasoning Pro 26 0 4 +1.13
Gemini 3 Pro 13 1 5 +0.59
GPT-5.1 0 0 4 -0.07
GPT-5 (Generic) 0 0 4 -0.04