Grok 4 (Reasoning)

xAI (Grok) Grok 4

Rank #17 overall · 142 evaluations

8.08
Performance Metrics
Metric Breakdown
Relevance
8.83
Semantic Consistency
8.47
Style/Tone
8.42
Human Likeness
8.23
Readability
8.10
Factual Accuracy
7.51
Ensemble Agreement
7.34
Strengths
Relevance: 8.83
Semantic Consistency: 8.47
Areas for Improvement
Ensemble Agreement: 7.34
Factual Accuracy: 7.51
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 13 18 63 -0.02
GPT-5 1 32 21 -0.34
Sonar Pro 13 14 15 +0.22
Gemini 3 Pro 14 3 20 +1.24
GPT-5 (Generic) 5 27 4 +0.26
Sonar Reasoning Pro 21 0 10 +0.72
Claude 3.7 Sonnet 3 8 20 -0.12
GPT-5.1 1 5 9 +0.21
GPT-4.1 0 10 3 -0.61
GPT-5.1 (Thinking) 0 5 7 -0.33
Gemini 3 Flash 0 4 6 -0.35
Grok 4.1 (Non-Reasoning) 0 1 2 -0.11
Grok 4.1 (Reasoning) 0 1 2 -0.09