Gemini 2.5 Flash

Google Gemini Gemini 2.5

Rank #7 overall · 41 evaluations

8.67
Performance Metrics
Metric Breakdown
Relevance
9.45
Style/Tone
9.28
Semantic Consistency
9.06
Human Likeness
8.86
Readability
8.68
Factual Accuracy
7.78
Ensemble Agreement
7.16
Strengths
Relevance: 9.45
Style/Tone: 9.28
Areas for Improvement
Ensemble Agreement: 7.16
Factual Accuracy: 7.78
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 4 5 10 +0.57
GPT-5.2 (Thinking) 1 7 8 -0.03
GPT-5 Mini 1 1 8 -0.04
Mistral Small 3.2 0 0 9 +0.17
Jamba Mini 4 0 5 +3.05
Gemini 2.5 Pro 0 0 8 0.00
Gemini 2.0 Flash 1 1 6 +0.07
Gemini 3 Pro 1 1 2 +0.20
Magistral Small 3 0 0 +9.40