Grok 4
xAI (Grok) Grok 4
Rank #18 overall · 39 evaluations
8.02
Performance Metrics
Metric Breakdown
Relevance
Style/Tone
Semantic Consistency
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 8.82
Style/Tone: 8.49
Areas for Improvement
Ensemble Agreement: 7.49
Factual Accuracy: 7.72
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 4 | 16 | 16 | -0.22 |
| Gemini 3 Pro | 12 | 8 | 16 | +0.07 |