GPT-4.1

OpenAI GPT-4

Rank #8 overall · 50 evaluations

8.64
Performance Metrics
Metric Breakdown
Style/Tone
8.96
Relevance
8.90
Semantic Consistency
8.85
Human Likeness
8.84
Readability
8.56
Factual Accuracy
8.34
Ensemble Agreement
7.61
Strengths
Style/Tone: 8.96
Relevance: 8.90
Areas for Improvement
Ensemble Agreement: 7.61
Factual Accuracy: 8.34
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 6 1 6 +0.34
Grok 4 (Reasoning) 10 0 3 +0.61
Gemini 3 Flash 5 0 8 +0.30
GPT-4.1 Nano 1 1 8 +0.15
GPT-4o 4 1 4 +0.44
GPT-5 Mini 2 3 4 +0.26
Gemini 3 Pro 3 0 1 +5.98
Gemini 2.5 Pro 0 0 3 +0.07
Grok 3 Mini 2 0 1 +0.25
Claude 3.5 Haiku 2 0 1 +0.52