GPT-5.1 (Thinking) vs Grok 4 (Reasoning)

12 head-to-head matchups on identical queries

8.37
5 Wins 7 Ties 0 Losses
VS
8.08
0 Wins 7 Ties 5 Losses
Metric-by-Metric Comparison