Overview

Mistral Large 3 (released December 2025) is Mistral AI''s most capable model — a 675B total / 41B active MoE that adds multimodal image understanding and native agentic tool use to the Large family. Trained from scratch on 3,000 NVIDIA H200 GPUs, it achieves parity with the best instruction-tuned open-weight models while offering best-in-class multilingual conversation support. Available under Apache 2.0 for unrestricted commercial use.

Architecture & Model Specs

Architecture: Granular Mixture-of-Experts (MoE) with Grouped Sparse Attention
Parameters: 675B total, 41B active per token
Context Window: 256k tokens (base); 500k with sliding window attention
Multimodal: Text + image understanding — visual QA, diagram/chart interpretation
Training: 30T+ tokens on 3,000 H200 GPUs
Function Calling: 94.2% on Berkeley Function Calling Benchmark — matches GPT-5 Turbo
Format: NVFP4 compressed checkpoint for efficient Blackwell/A100/H100 deployment
License: Apache 2.0 — full commercial use without attribution

API Performance

API Access: Mistral AI Studio, Azure Foundry, Amazon Bedrock, IBM watsonx, OpenRouter
Response Time: ~800ms-1.5s for standard generation
Pricing: Input $0.50/1M tokens, Output$ 1.50/1M tokens (Azure); $4/$ 12 on Mistral API
Tool Use: Native function calling with JSON schema — no prompt engineering needed
Fine-Tuning: Available for enterprise customers via Mistral platform

Key Features

Multimodal Understanding: Interprets images, diagrams, charts alongside text
Native Tool Use: First open-weight model with built-in function calling (94.2% success rate)
Consistent Behavior: Fewer breakdowns than peers in multi-turn conversations and complex inputs
Apache 2.0: Full commercial use, modification, and redistribution without restrictions
Enterprise Deployment: Available on Azure, AWS Bedrock, IBM watsonx for global reach
EU Data Sovereignty: Training and inference within EU borders — GDPR compliant by design

Pricing Breakdown

Plan	Price	Features
Mistral API	$4/1M input,$ 12/1M output	Full Large 3 access, function calling
Azure Foundry	$0.50/1M input,$ 1.50/1M output	Global Standard, West US 3
AWS Bedrock	Custom	Managed deployment, regional options
Self-Hosted	Free + infra	Apache 2.0 weights, your infrastructure

Privacy & Safety

Data Residency: All data stays within EU via Mistral API — critical for regulated industries
GDPR Compliance: Built for European regulatory requirements
Open Weights: Self-hosting option means zero data leaves your infrastructure
Fine-Tuning Privacy: Enterprise fine-tuning data isolated and not shared

The Killer Feature

Native agentic tool use + Apache 2.0 — Large 3 is the first open-weight model with truly native function calling. Define tools in JSON schema, and it reliably calls them with correct parameters, handles errors, and chains multiple tool calls. At 94.2% on the Berkeley benchmark, it matches GPT-5 Turbo. Combined with full Apache 2.0 licensing, you get enterprise-grade agent capabilities you can self-host, fine-tune, and modify without any licensing restrictions.

Pros & Cons

Pros:

Full Apache 2.0 — zero licensing restrictions
Native multimodal (text + image) understanding
94.2% function calling success rate
Consistent behavior in multi-turn conversations
EU data sovereignty and GDPR compliance

Cons:

Dense MoE = higher inference cost than lighter models
256k context (500k sliding) trails DeepSeek-V4''s 1M
Smaller ecosystem than OpenAI/Anthropic
Reasoning variant still forthcoming

Verdict

Mistral Large 3 is the most versatile open-weight model for enterprise use. Native tool use, multimodal understanding, and Apache 2.0 licensing make it ideal for building production agentic systems. At $0.50/1M input tokens on Azure, it''s remarkably affordable for a frontier-class model.

Mistral Large 3 - Europe's Multimodal Open-Weight Frontier

Tech Specs