GPT-4o
OpenAI GPT-4
Rank #13 overall · 27 evaluations
8.29
Performance Metrics
Metric Breakdown
Semantic Consistency
Style/Tone
Human Likeness
Relevance
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Semantic Consistency: 8.89
Style/Tone: 8.69
Areas for Improvement
Ensemble Agreement: 7.05
Factual Accuracy: 7.26
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| GPT-5 Mini | 1 | 5 | 5 | -0.15 |
| GPT-4.1 | 1 | 4 | 4 | -0.44 |
| GPT-4.1 Nano | 1 | 5 | 3 | -0.28 |