GPT-4o

OpenAI GPT-4

Rank #13 overall · 27 evaluations

8.29
Performance Metrics
Metric Breakdown
Semantic Consistency
8.89
Style/Tone
8.69
Human Likeness
8.49
Relevance
8.46
Readability
8.31
Factual Accuracy
7.26
Ensemble Agreement
7.05
Strengths
Semantic Consistency: 8.89
Style/Tone: 8.69
Areas for Improvement
Ensemble Agreement: 7.05
Factual Accuracy: 7.26
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
GPT-5 Mini 1 5 5 -0.15
GPT-4.1 1 4 4 -0.44
GPT-4.1 Nano 1 5 3 -0.28