Gemini 3 Flash

Google Gemini Gemini 3

Rank #8 overall · 51 evaluations

8.64
Performance Metrics
Metric Breakdown
Relevance
9.21
Semantic Consistency
9.06
Style/Tone
9.03
Human Likeness
8.87
Readability
8.49
Factual Accuracy
8.16
Ensemble Agreement
7.44
Strengths
Relevance: 9.21
Semantic Consistency: 9.06
Areas for Improvement
Ensemble Agreement: 7.44
Factual Accuracy: 8.16
Performance by Domain
Head-to-Head Record
OpponentWinsLossesTiesAvg Diff
Claude Sonnet 4.5 9 5 15 +0.33
GPT-5.2 2 4 13 +0.42
Sonar 5 4 4 +0.72
GPT-4.1 0 5 8 -0.30
GPT-4.1 Nano 0 4 8 -0.50
Grok 4 (Reasoning) 4 0 6 +0.35
Grok 4.1 (Reasoning) 4 0 5 +0.46
Grok 3 Mini 1 2 6 +0.69
Jamba Large 0 4 1 -1.10
Gemini 3 Pro 1 1 3 +1.56
Gemini 2.5 Flash Lite 2 0 2 +0.26
Gemini 2.5 Pro 0 1 3 -0.14
GPT-5.2 (Thinking) 1 1 1 +2.26