Mistral Large 3 - Europe's Multimodal Open-Weight Frontier
PaidMistral Large 3 (Dec 2025) is a 675B/41B MoE multimodal model with image understanding, native agentic tool use, and Apache 2.0 licensing. Trained on 3000 H200 GPUs, it delivers frontier-class performance with open-source flexibility.
Tech Specs
Overview
Mistral Large 3 (released December 2025) is Mistral AI''s most capable model — a 675B total / 41B active MoE that adds multimodal image understanding and native agentic tool use to the Large family. Trained from scratch on 3,000 NVIDIA H200 GPUs, it achieves parity with the best instruction-tuned open-weight models while offering best-in-class multilingual conversation support. Available under Apache 2.0 for unrestricted commercial use.
Architecture & Model Specs
- Architecture: Granular Mixture-of-Experts (MoE) with Grouped Sparse Attention
- Parameters: 675B total, 41B active per token
- Context Window: 256k tokens (base); 500k with sliding window attention
- Multimodal: Text + image understanding — visual QA, diagram/chart interpretation
- Training: 30T+ tokens on 3,000 H200 GPUs
- Function Calling: 94.2% on Berkeley Function Calling Benchmark — matches GPT-5 Turbo
- Format: NVFP4 compressed checkpoint for efficient Blackwell/A100/H100 deployment
- License: Apache 2.0 — full commercial use without attribution
API Performance
- API Access: Mistral AI Studio, Azure Foundry, Amazon Bedrock, IBM watsonx, OpenRouter
- Response Time: ~800ms-1.5s for standard generation
- Pricing: Input 1.50/1M tokens (Azure); 12 on Mistral API
- Tool Use: Native function calling with JSON schema — no prompt engineering needed
- Fine-Tuning: Available for enterprise customers via Mistral platform
Key Features
- Multimodal Understanding: Interprets images, diagrams, charts alongside text
- Native Tool Use: First open-weight model with built-in function calling (94.2% success rate)
- Consistent Behavior: Fewer breakdowns than peers in multi-turn conversations and complex inputs
- Apache 2.0: Full commercial use, modification, and redistribution without restrictions
- Enterprise Deployment: Available on Azure, AWS Bedrock, IBM watsonx for global reach
- EU Data Sovereignty: Training and inference within EU borders — GDPR compliant by design
Pricing Breakdown
| Plan | Price | Features |
|---|---|---|
| Mistral API | 12/1M output | Full Large 3 access, function calling |
| Azure Foundry | 1.50/1M output | Global Standard, West US 3 |
| AWS Bedrock | Custom | Managed deployment, regional options |
| Self-Hosted | Free + infra | Apache 2.0 weights, your infrastructure |
Privacy & Safety
- Data Residency: All data stays within EU via Mistral API — critical for regulated industries
- GDPR Compliance: Built for European regulatory requirements
- Open Weights: Self-hosting option means zero data leaves your infrastructure
- Fine-Tuning Privacy: Enterprise fine-tuning data isolated and not shared
The Killer Feature
Native agentic tool use + Apache 2.0 — Large 3 is the first open-weight model with truly native function calling. Define tools in JSON schema, and it reliably calls them with correct parameters, handles errors, and chains multiple tool calls. At 94.2% on the Berkeley benchmark, it matches GPT-5 Turbo. Combined with full Apache 2.0 licensing, you get enterprise-grade agent capabilities you can self-host, fine-tune, and modify without any licensing restrictions.
Pros & Cons
Pros:
- Full Apache 2.0 — zero licensing restrictions
- Native multimodal (text + image) understanding
- 94.2% function calling success rate
- Consistent behavior in multi-turn conversations
- EU data sovereignty and GDPR compliance
Cons:
- Dense MoE = higher inference cost than lighter models
- 256k context (500k sliding) trails DeepSeek-V4''s 1M
- Smaller ecosystem than OpenAI/Anthropic
- Reasoning variant still forthcoming
Verdict
Mistral Large 3 is the most versatile open-weight model for enterprise use. Native tool use, multimodal understanding, and Apache 2.0 licensing make it ideal for building production agentic systems. At $0.50/1M input tokens on Azure, it''s remarkably affordable for a frontier-class model.