Grok 4.1 (Reasoning)

xAI (Grok) Grok 4

Rank #22 overall · 99 evaluations

7.68
Performance Metrics
Metric Breakdown
Relevance
8.38
Style/Tone
8.05
Semantic Consistency
8.04
Human Likeness
7.88
Readability
7.68
Factual Accuracy
6.96
Ensemble Agreement
6.61
Strengths
Relevance: 8.38
Style/Tone: 8.05
Areas for Improvement
Ensemble Agreement: 6.61
Factual Accuracy: 6.96
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 8 11 37 +0.04
Gemini 3 Pro 9 7 22 +0.29
GPT-5.2 (Thinking) 0 21 10 -0.61
GPT-5.2 0 9 7 -0.60
GPT-5.1 (Thinking) 2 7 4 -0.19
Claude Sonnet 4.5 (Thinking) 3 3 6 +0.06
Gemini 3 Flash 0 4 5 -0.46
Grok 4.1 (Non-Reasoning) 1 2 5 -0.26
Sonar Reasoning Pro 4 0 2 +4.06
GPT-5.1 0 1 5 -0.16
Grok Code 1 0 4 +0.12
Grok 4 (Non-Reasoning) 0 1 4 -0.34
Grok 4 (Reasoning) 1 0 2 +0.09