Best AI Models for General — 2026 Rankings
43 models evaluated across 1,116 general queries. Ranked by composite Trust Score.
9.07
| Rank | Model | Provider | Trust Score | RC | FA | SC | RF | ST | ED | HL | Evals |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Gemini 2.0 Flash | Google Gemini | 9.07 | 9.06 | 8.63 | 9.50 | 9.39 | 9.67 | 8.00 | 8.86 | 9 * |
| 2 | GPT-5.1 | OpenAI | 9.04 | 8.80 | 9.05 | 9.20 | 9.65 | 9.00 | 8.60 | 9.08 | 10 * |
| 3 | Gemini 2.5 Pro | Google Gemini | 8.94 | 8.81 | 8.70 | 9.33 | 9.43 | 9.37 | 7.93 | 8.92 | 15 * |
| 4 | GPT-5 Mini | OpenAI | 8.77 | 8.51 | 8.91 | 9.06 | 9.23 | 9.23 | 7.53 | 8.69 | 22 * |
| 5 | GPT-5 | OpenAI | 8.72 | 8.66 | 8.72 | 9.06 | 9.39 | 8.87 | 7.59 | 8.76 | 27 * |
| 6 | Gemini 2.5 Flash | Google Gemini | 8.70 | 8.75 | 7.78 | 9.12 | 9.48 | 9.36 | 7.03 | 8.89 | 33 |
| 6 | GPT-4.1 Nano | OpenAI | 8.70 | 8.72 | 7.44 | 9.06 | 9.39 | 9.03 | 7.58 | 8.79 | 18 * |
| 8 | Gemini 3 Flash | Google Gemini | 8.68 | 8.54 | 8.08 | 9.12 | 9.15 | 9.05 | 7.51 | 8.92 | 40 |
| 9 | GPT-5.2 (Thinking) | OpenAI | 8.63 | 8.53 | 8.28 | 8.94 | 9.36 | 8.91 | 7.11 | 8.70 | 49 |
| 10 | Gemini 2.5 Flash Lite | Google Gemini | 8.59 | 8.30 | 8.13 | 8.80 | 8.80 | 8.80 | 8.50 | 8.50 | 5 * |
| 11 | GPT-5.2 | OpenAI | 8.50 | 8.46 | 8.29 | 8.84 | 9.20 | 8.88 | 7.15 | 8.51 | 28 * |
| 12 | GPT-4.1 | OpenAI | 8.48 | 8.57 | 8.22 | 8.65 | 8.52 | 8.72 | 7.74 | 8.77 | 31 |
| 13 | Claude Sonnet 4.5 | Anthropic | 8.28 | 8.11 | 7.70 | 8.63 | 8.94 | 8.52 | 7.09 | 8.44 | 186 |
| 13 | GPT-4o | OpenAI | 8.28 | 8.37 | 7.39 | 8.89 | 8.29 | 8.63 | 7.50 | 8.43 | 19 * |
| 15 | Jamba Large | AI21 | 8.23 | 7.95 | 6.63 | 8.50 | 8.80 | 8.60 | 7.55 | 8.10 | 10 * |
| 16 | Gemini 3 Pro | Google Gemini | 8.11 | 8.01 | 7.42 | 8.41 | 8.87 | 8.46 | 6.70 | 8.30 | 136 |
| 17 | Grok 4 | xAI (Grok) | 7.81 | 7.67 | 7.96 | 8.17 | 8.53 | 8.68 | 7.43 | 7.87 | 15 * |
| 18 | Grok 4 (Reasoning) | xAI (Grok) | 7.80 | 7.86 | 7.17 | 8.25 | 8.56 | 8.31 | 7.03 | 8.07 | 71 |
| 19 | Sonar | Perplexity | 7.69 | 7.62 | 4.92 | 8.12 | 8.06 | 8.18 | 6.53 | 7.76 | 17 * |
| 20 | Grok 3 Mini | xAI (Grok) | 7.56 | 7.56 | 6.63 | 7.82 | 8.08 | 8.73 | 7.09 | 7.38 | 12 * |
| 21 | GPT-5.1 (Thinking) | OpenAI | 7.55 | 7.72 | 7.00 | 8.04 | 8.22 | 7.98 | 5.81 | 7.69 | 25 * |
| 22 | Sonar Pro | Perplexity | 7.51 | 7.59 | 6.80 | 7.96 | 8.19 | 7.90 | 6.31 | 7.68 | 36 |
| 22 | Sonar Reasoning Pro | Perplexity | 7.51 | 7.18 | 7.08 | 7.77 | 8.27 | 7.61 | 7.12 | 7.20 | 22 * |
| 22 | Jamba Mini | AI21 | 7.51 | 7.33 | 5.83 | 7.83 | 7.96 | 7.88 | 6.75 | 7.33 | 12 * |
| 25 | Mistral Small 3.2 | Mistral | 7.47 | 7.21 | 6.20 | 7.72 | 8.03 | 7.88 | 6.71 | 7.34 | 17 * |
| 26 | Claude 3.7 Sonnet | Anthropic | 7.32 | 7.46 | 6.85 | 7.93 | 7.83 | 7.39 | 6.17 | 7.45 | 23 * |
| 26 | Claude Opus 4.6 (Adaptive) | Anthropic | 7.32 | 6.95 | 6.63 | 7.83 | 7.86 | 7.88 | 5.43 | 7.29 | 21 * |
| 28 | Grok 4.1 (Reasoning) | xAI (Grok) | 7.25 | 7.25 | 6.52 | 7.58 | 7.83 | 7.64 | 6.12 | 7.48 | 41 |
| 29 | GPT-5 (Generic) | OpenAI | 7.09 | 7.32 | 6.40 | 7.60 | 7.72 | 7.24 | 5.56 | 7.44 | 25 * |
| 30 | Command R+ | Cohere | 6.78 | 6.78 | 5.60 | 6.96 | 6.88 | 6.84 | 6.55 | 6.75 | 8 * |
| 31 | Claude Sonnet 4 | Anthropic | 6.67 | 6.66 | 6.44 | 6.88 | 6.94 | 6.75 | 6.40 | 6.72 | 8 * |
| 32 | Command A | Cohere | 6.55 | 6.40 | 5.50 | 6.64 | 6.79 | 6.64 | 6.26 | 6.53 | 7 * |
| 33 | Command R | Cohere | 6.34 | 6.36 | 5.40 | 6.43 | 6.50 | 6.50 | 6.07 | 6.36 | 7 * |
| 34 | Mistral Nemo | Mistral | 6.21 | 6.05 | 5.72 | 6.50 | 6.75 | 6.50 | 5.35 | 6.30 | 10 * |
| 35 | Mistral Medium | Mistral | 6.20 | 6.00 | 6.25 | 6.42 | 6.58 | 6.25 | 5.75 | 6.17 | 6 * |
| 36 | Command R 7B | Cohere | 5.72 | 5.66 | 5.60 | 5.72 | 5.96 | 5.72 | 5.50 | 5.71 | 8 * |
| 37 | Mistral Large | Mistral | 5.26 | 5.04 | 5.29 | 5.43 | 5.57 | 5.29 | 5.00 | 5.21 | 7 * |
| 38 | Claude Sonnet 4.5 (Thinking) | Anthropic | 5.23 | 5.15 | 4.00 | 5.40 | 5.65 | 5.35 | 4.85 | 5.17 | 10 * |
| 39 | Magistral Medium | Mistral | 4.69 | 4.86 | 4.50 | 5.36 | 5.50 | 5.14 | 1.86 | 5.14 | 7 * |
| 40 | Grok 4.1 (Non-Reasoning) | xAI (Grok) | 3.90 | 3.83 | 3.67 | 4.06 | 4.22 | 4.17 | 3.44 | 3.89 | 9 * |
| 41 | Grok 4 (Non-Reasoning) | xAI (Grok) | 2.92 | 2.83 | 2.83 | 3.08 | 3.17 | 3.33 | 2.25 | 2.92 | 6 * |
| 42 | Grok Code | xAI (Grok) | 1.38 | 1.45 | 1.36 | 1.59 | 1.09 | 1.82 | 1.36 | 1.50 | 11 * |
| 43 | Magistral Small | Mistral | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 10 * |
* Low sample size (<30 evaluations) — ranking may shift with more data
Test AI Models on Your General Questions
See which model performs best on your specific general queries with real-time Trust Score evaluation.
Try Search Umbrella →