GPT-5.2 (Thinking) vs Grok 4.1 (Reasoning)

31 head-to-head matchups on identical queries

8.71
21 Wins 10 Ties 0 Losses
VS
7.68
0 Wins 10 Ties 21 Losses
Metric-by-Metric Comparison