Gemini 3 Pro

Google Gemini Gemini 3

Rank #14 overall · 479 evaluations

8.26
Performance Metrics
Metric Breakdown
Relevance
9.03
Semantic Consistency
8.57
Style/Tone
8.53
Human Likeness
8.40
Readability
8.13
Factual Accuracy
7.64
Ensemble Agreement
6.93
Strengths
Relevance: 9.03
Semantic Consistency: 8.57
Areas for Improvement
Ensemble Agreement: 6.93
Factual Accuracy: 7.64
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 27 77 111 -0.37
GPT-5.2 (Thinking) 1 32 37 -0.37
GPT-5.1 (Thinking) 10 27 22 +0.11
Grok 4.1 (Reasoning) 7 9 22 -0.29
Grok 4 (Reasoning) 3 14 20 -1.24
Grok 4 8 12 16 -0.07
GPT-5.1 1 8 10 -0.80
GPT-5 1 13 5 -0.59
Sonar Pro 4 7 6 -0.29
GPT-5.2 0 7 5 -1.14
Claude Opus 4.6 (Adaptive) 0 5 4 -2.67
Gemini 2.5 Pro 1 2 5 -1.15
Sonar Reasoning Pro 6 1 0 +4.59
Claude Sonnet 4 0 2 3 -0.37
Gemini 3 Flash 1 1 3 -1.56
Claude Sonnet 4.5 (Thinking) 2 1 2 -0.09
GPT-4.1 0 3 1 -5.98
Gemini 2.0 Flash 0 1 3 -0.31
Gemini 2.5 Flash 1 1 2 -0.20
Gemini 2.5 Flash Lite 2 1 1 -1.69