Grok 3 Mini

xAI (Grok) Grok 3

Rank #19 overall · 14 evaluations

7.76
Performance Metrics
Metric Breakdown
Style/Tone
8.92
Relevance
8.29
Semantic Consistency
7.95
Readability
7.73
Human Likeness
7.61
Factual Accuracy
7.50
Ensemble Agreement
7.15
Strengths
Style/Tone: 8.92
Relevance: 8.29
Areas for Improvement
Ensemble Agreement: 7.15
Factual Accuracy: 7.50
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 1 0 8 +0.05
Gemini 3 Flash 2 1 6 -0.69
GPT-5.2 2 1 6 -0.65
Gemini 2.5 Pro 0 1 2 -0.18
GPT-4.1 0 2 1 -0.25
Claude 3.5 Haiku 2 0 1 +0.28