Gemini 2.5 Flash
Google Gemini Gemini 2.5
Rank #7 overall · 41 evaluations
8.67
Performance Metrics
Metric Breakdown
Relevance
Style/Tone
Semantic Consistency
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.45
Style/Tone: 9.28
Areas for Improvement
Ensemble Agreement: 7.16
Factual Accuracy: 7.78
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Claude Sonnet 4.5 | 4 | 5 | 10 | +0.57 |
| GPT-5.2 (Thinking) | 1 | 7 | 8 | -0.03 |
| GPT-5 Mini | 1 | 1 | 8 | -0.04 |
| Mistral Small 3.2 | 0 | 0 | 9 | +0.17 |
| Jamba Mini | 4 | 0 | 5 | +3.05 |
| Gemini 2.5 Pro | 0 | 0 | 8 | 0.00 |
| Gemini 2.0 Flash | 1 | 1 | 6 | +0.07 |
| Gemini 3 Pro | 1 | 1 | 2 | +0.20 |
| Magistral Small | 3 | 0 | 0 | +9.40 |