Grok 4

xAI (Grok) Grok 4

Rank #18 overall · 39 evaluations

8.02
Performance Metrics
Metric Breakdown
Relevance
8.82
Style/Tone
8.49
Semantic Consistency
8.36
Human Likeness
8.09
Readability
7.86
Factual Accuracy
7.72
Ensemble Agreement
7.49
Strengths
Relevance: 8.82
Style/Tone: 8.49
Areas for Improvement
Ensemble Agreement: 7.49
Factual Accuracy: 7.72
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 4 16 16 -0.22
Gemini 3 Pro 12 8 16 +0.07