Gemini 3 Flash
Google Gemini Gemini 3
Rank #8 overall · 51 evaluations
8.64
Performance Metrics
Metric Breakdown
Relevance
Semantic Consistency
Style/Tone
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.21
Semantic Consistency: 9.06
Areas for Improvement
Ensemble Agreement: 7.44
Factual Accuracy: 8.16
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 9 | 5 | 15 | +0.33 |
| GPT-5.2 | 2 | 4 | 13 | +0.42 |
| Sonar | 5 | 4 | 4 | +0.72 |
| GPT-4.1 | 0 | 5 | 8 | -0.30 |
| GPT-4.1 Nano | 0 | 4 | 8 | -0.50 |
| Grok 4 (Reasoning) | 4 | 0 | 6 | +0.35 |
| Grok 4.1 (Reasoning) | 4 | 0 | 5 | +0.46 |
| Grok 3 Mini | 1 | 2 | 6 | +0.69 |
| Jamba Large | 0 | 4 | 1 | -1.10 |
| Gemini 3 Pro | 1 | 1 | 3 | +1.56 |
| Gemini 2.5 Flash Lite | 2 | 0 | 2 | +0.26 |
| Gemini 2.5 Pro | 0 | 1 | 3 | -0.14 |
| GPT-5.2 (Thinking) | 1 | 1 | 1 | +2.26 |