Best AI Models for General — 2026 Rankings

43 models evaluated across 1,116 general queries. Ranked by composite Trust Score.

9.07
Top Model for General

Gemini 2.0 Flash

Google Gemini 9 evaluations
Rank Model Provider Trust Score RC FA SC RF ST ED HL Evals
1 Gemini 2.0 Flash Google Gemini 9.07 9.06 8.63 9.50 9.39 9.67 8.00 8.86 9 *
2 GPT-5.1 OpenAI 9.04 8.80 9.05 9.20 9.65 9.00 8.60 9.08 10 *
3 Gemini 2.5 Pro Google Gemini 8.94 8.81 8.70 9.33 9.43 9.37 7.93 8.92 15 *
4 GPT-5 Mini OpenAI 8.77 8.51 8.91 9.06 9.23 9.23 7.53 8.69 22 *
5 GPT-5 OpenAI 8.72 8.66 8.72 9.06 9.39 8.87 7.59 8.76 27 *
6 Gemini 2.5 Flash Google Gemini 8.70 8.75 7.78 9.12 9.48 9.36 7.03 8.89 33
6 GPT-4.1 Nano OpenAI 8.70 8.72 7.44 9.06 9.39 9.03 7.58 8.79 18 *
8 Gemini 3 Flash Google Gemini 8.68 8.54 8.08 9.12 9.15 9.05 7.51 8.92 40
9 GPT-5.2 (Thinking) OpenAI 8.63 8.53 8.28 8.94 9.36 8.91 7.11 8.70 49
10 Gemini 2.5 Flash Lite Google Gemini 8.59 8.30 8.13 8.80 8.80 8.80 8.50 8.50 5 *
11 GPT-5.2 OpenAI 8.50 8.46 8.29 8.84 9.20 8.88 7.15 8.51 28 *
12 GPT-4.1 OpenAI 8.48 8.57 8.22 8.65 8.52 8.72 7.74 8.77 31
13 Claude Sonnet 4.5 Anthropic 8.28 8.11 7.70 8.63 8.94 8.52 7.09 8.44 186
13 GPT-4o OpenAI 8.28 8.37 7.39 8.89 8.29 8.63 7.50 8.43 19 *
15 Jamba Large AI21 8.23 7.95 6.63 8.50 8.80 8.60 7.55 8.10 10 *
16 Gemini 3 Pro Google Gemini 8.11 8.01 7.42 8.41 8.87 8.46 6.70 8.30 136
17 Grok 4 xAI (Grok) 7.81 7.67 7.96 8.17 8.53 8.68 7.43 7.87 15 *
18 Grok 4 (Reasoning) xAI (Grok) 7.80 7.86 7.17 8.25 8.56 8.31 7.03 8.07 71
19 Sonar Perplexity 7.69 7.62 4.92 8.12 8.06 8.18 6.53 7.76 17 *
20 Grok 3 Mini xAI (Grok) 7.56 7.56 6.63 7.82 8.08 8.73 7.09 7.38 12 *
21 GPT-5.1 (Thinking) OpenAI 7.55 7.72 7.00 8.04 8.22 7.98 5.81 7.69 25 *
22 Sonar Pro Perplexity 7.51 7.59 6.80 7.96 8.19 7.90 6.31 7.68 36
22 Sonar Reasoning Pro Perplexity 7.51 7.18 7.08 7.77 8.27 7.61 7.12 7.20 22 *
22 Jamba Mini AI21 7.51 7.33 5.83 7.83 7.96 7.88 6.75 7.33 12 *
25 Mistral Small 3.2 Mistral 7.47 7.21 6.20 7.72 8.03 7.88 6.71 7.34 17 *
26 Claude 3.7 Sonnet Anthropic 7.32 7.46 6.85 7.93 7.83 7.39 6.17 7.45 23 *
26 Claude Opus 4.6 (Adaptive) Anthropic 7.32 6.95 6.63 7.83 7.86 7.88 5.43 7.29 21 *
28 Grok 4.1 (Reasoning) xAI (Grok) 7.25 7.25 6.52 7.58 7.83 7.64 6.12 7.48 41
29 GPT-5 (Generic) OpenAI 7.09 7.32 6.40 7.60 7.72 7.24 5.56 7.44 25 *
30 Command R+ Cohere 6.78 6.78 5.60 6.96 6.88 6.84 6.55 6.75 8 *
31 Claude Sonnet 4 Anthropic 6.67 6.66 6.44 6.88 6.94 6.75 6.40 6.72 8 *
32 Command A Cohere 6.55 6.40 5.50 6.64 6.79 6.64 6.26 6.53 7 *
33 Command R Cohere 6.34 6.36 5.40 6.43 6.50 6.50 6.07 6.36 7 *
34 Mistral Nemo Mistral 6.21 6.05 5.72 6.50 6.75 6.50 5.35 6.30 10 *
35 Mistral Medium Mistral 6.20 6.00 6.25 6.42 6.58 6.25 5.75 6.17 6 *
36 Command R 7B Cohere 5.72 5.66 5.60 5.72 5.96 5.72 5.50 5.71 8 *
37 Mistral Large Mistral 5.26 5.04 5.29 5.43 5.57 5.29 5.00 5.21 7 *
38 Claude Sonnet 4.5 (Thinking) Anthropic 5.23 5.15 4.00 5.40 5.65 5.35 4.85 5.17 10 *
39 Magistral Medium Mistral 4.69 4.86 4.50 5.36 5.50 5.14 1.86 5.14 7 *
40 Grok 4.1 (Non-Reasoning) xAI (Grok) 3.90 3.83 3.67 4.06 4.22 4.17 3.44 3.89 9 *
41 Grok 4 (Non-Reasoning) xAI (Grok) 2.92 2.83 2.83 3.08 3.17 3.33 2.25 2.92 6 *
42 Grok Code xAI (Grok) 1.38 1.45 1.36 1.59 1.09 1.82 1.36 1.50 11 *
43 Magistral Small Mistral 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10 *

* Low sample size (<30 evaluations) — ranking may shift with more data

Test AI Models on Your General Questions

See which model performs best on your specific general queries with real-time Trust Score evaluation.

Try Search Umbrella →