Gemini 3 Pro
Google Gemini Gemini 3
Rank #14 overall · 479 evaluations
8.26
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.03
Semantic Consistency: 8.57
Areas for Improvement
Ensemble Agreement: 6.93
Factual Accuracy: 7.64
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 27 | 77 | 111 | -0.37 |
| GPT-5.2 (Thinking) | 1 | 32 | 37 | -0.37 |
| GPT-5.1 (Thinking) | 10 | 27 | 22 | +0.11 |
| Grok 4.1 (Reasoning) | 7 | 9 | 22 | -0.29 |
| Grok 4 (Reasoning) | 3 | 14 | 20 | -1.24 |
| Grok 4 | 8 | 12 | 16 | -0.07 |
| GPT-5.1 | 1 | 8 | 10 | -0.80 |
| GPT-5 | 1 | 13 | 5 | -0.59 |
| Sonar Pro | 4 | 7 | 6 | -0.29 |
| GPT-5.2 | 0 | 7 | 5 | -1.14 |
| Claude Opus 4.6 (Adaptive) | 0 | 5 | 4 | -2.67 |
| Gemini 2.5 Pro | 1 | 2 | 5 | -1.15 |
| Sonar Reasoning Pro | 6 | 1 | 0 | +4.59 |
| Claude Sonnet 4 | 0 | 2 | 3 | -0.37 |
| Gemini 3 Flash | 1 | 1 | 3 | -1.56 |
| Claude Sonnet 4.5 (Thinking) | 2 | 1 | 2 | -0.09 |
| GPT-4.1 | 0 | 3 | 1 | -5.98 |
| Gemini 2.0 Flash | 0 | 1 | 3 | -0.31 |
| Gemini 2.5 Flash | 1 | 1 | 2 | -0.20 |
| Gemini 2.5 Flash Lite | 2 | 1 | 1 | -1.69 |