Grok 4 (Reasoning)
xAI (Grok) Grok 4
Rank #17 overall · 142 evaluations
8.08
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 8.83
Semantic Consistency: 8.47
Areas for Improvement
Ensemble Agreement: 7.34
Factual Accuracy: 7.51
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 13 | 18 | 63 | -0.02 |
| GPT-5 | 1 | 32 | 21 | -0.34 |
| Sonar Pro | 13 | 14 | 15 | +0.22 |
| Gemini 3 Pro | 14 | 3 | 20 | +1.24 |
| GPT-5 (Generic) | 5 | 27 | 4 | +0.26 |
| Sonar Reasoning Pro | 21 | 0 | 10 | +0.72 |
| Claude 3.7 Sonnet | 3 | 8 | 20 | -0.12 |
| GPT-5.1 | 1 | 5 | 9 | +0.21 |
| GPT-4.1 | 0 | 10 | 3 | -0.61 |
| GPT-5.1 (Thinking) | 0 | 5 | 7 | -0.33 |
| Gemini 3 Flash | 0 | 4 | 6 | -0.35 |
| Grok 4.1 (Non-Reasoning) | 0 | 1 | 2 | -0.11 |
| Grok 4.1 (Reasoning) | 0 | 1 | 2 | -0.09 |