GPT-5.1 vs Grok 4 (Reasoning)

15 head-to-head matchups on identical queries

8.52
5 Wins 9 Ties 1 Losses
VS
8.08
1 Wins 9 Ties 5 Losses
Metric-by-Metric Comparison