Grok 4 (Reasoning)

Name: Grok 4 (Reasoning)
Rating: 8.08 (142 reviews)

xAI (Grok) Grok 4

Rank #17 overall · 142 evaluations

        8.08
      

Performance Metrics

Metric Breakdown

Relevance

8.83

Semantic Consistency

8.47

Style/Tone

8.42

Human Likeness

8.23

Readability

8.10

Factual Accuracy

7.51

Ensemble Agreement

7.34

Strengths

Relevance: 8.83

Semantic Consistency: 8.47

Areas for Improvement

Ensemble Agreement: 7.34

Factual Accuracy: 7.51

Performance by Domain

Head-to-Head Record

Opponent	Wins	Losses	Ties	Avg Diff
Claude Sonnet 4.5	13	18	63	-0.02
GPT-5	1	32	21	-0.34
Sonar Pro	13	14	15	+0.22
Gemini 3 Pro	14	3	20	+1.24
GPT-5 (Generic)	5	27	4	+0.26
Sonar Reasoning Pro	21	0	10	+0.72
Claude 3.7 Sonnet	3	8	20	-0.12
GPT-5.1	1	5	9	+0.21
GPT-4.1	0	10	3	-0.61
GPT-5.1 (Thinking)	0	5	7	-0.33
Gemini 3 Flash	0	4	6	-0.35
Grok 4.1 (Non-Reasoning)	0	1	2	-0.11
Grok 4.1 (Reasoning)	0	1	2	-0.09