Gemini 2.5 Pro
Google Gemini Gemini 2.5
Rank #1 overall · 16 evaluations
8.96
Performance Metrics
Metric Breakdown
Relevance
Style/Tone
Semantic Consistency
Human Likeness
Readability
Factual Accuracy
Ensemble Agreement
Strengths
Relevance: 9.47
Style/Tone: 9.41
Areas for Improvement
Ensemble Agreement: 7.94
Factual Accuracy: 8.78
Performance by Domain
Head-to-Head Record
| Opponent | Wins | Losses | Ties | Avg Diff |
|---|---|---|---|---|
| Gemini 3 Pro | 2 | 1 | 5 | +1.15 |
| Gemini 2.0 Flash | 1 | 1 | 6 | +0.07 |
| Gemini 2.5 Flash | 0 | 0 | 8 | 0.00 |
| Gemini 2.5 Flash Lite | 2 | 0 | 3 | +0.32 |
| Gemini 3 Flash | 1 | 0 | 3 | +0.14 |
| GPT-4.1 | 0 | 0 | 3 | -0.07 |
| Grok 3 Mini | 1 | 0 | 2 | +0.18 |
| Claude 3.5 Haiku | 2 | 0 | 1 | +0.45 |