GPT-4.1 vs Grok 4 (Reasoning)

13 head-to-head matchups on identical queries

8.64
10 Wins 3 Ties 0 Losses
VS
8.08
0 Wins 3 Ties 10 Losses
Metric-by-Metric Comparison