GPT-5.1 (Thinking) vs Grok 4.1 (Reasoning)

13 head-to-head matchups on identical queries

8.37
7 Wins 4 Ties 2 Losses
VS
7.68
2 Wins 4 Ties 7 Losses
Metric-by-Metric Comparison