Mistral Large 3 - Europe's Multimodal Open-Weight Frontier

Mistral Large 3 - Europe's Multimodal Open-Weight Frontier

Paid

Mistral Large 3 (Dec 2025) is a 675B/41B MoE multimodal model with image understanding, native agentic tool use, and Apache 2.0 licensing. Trained on 3000 H200 GPUs, it delivers frontier-class performance with open-source flexibility.

Enterprises, EU-regulated industries, multimodal teams
4.5 / 5
Updated Monday, May 11, 2026
Visit Official Website

Tech Specs

Model:Mistral Large 3 (675B total / 41B active MoE, multimodal)
Pricing:Pay-as-you-go
Key Features:
675B MoE / 41B ActiveMultimodal (Text + Image)Native Agentic Tool UseApache 2.0 License256k Context Window94.2% Function Calling

Overview

Mistral Large 3 (released December 2025) is Mistral AI''s most capable model — a 675B total / 41B active MoE that adds multimodal image understanding and native agentic tool use to the Large family. Trained from scratch on 3,000 NVIDIA H200 GPUs, it achieves parity with the best instruction-tuned open-weight models while offering best-in-class multilingual conversation support. Available under Apache 2.0 for unrestricted commercial use.

Architecture & Model Specs

  • Architecture: Granular Mixture-of-Experts (MoE) with Grouped Sparse Attention
  • Parameters: 675B total, 41B active per token
  • Context Window: 256k tokens (base); 500k with sliding window attention
  • Multimodal: Text + image understanding — visual QA, diagram/chart interpretation
  • Training: 30T+ tokens on 3,000 H200 GPUs
  • Function Calling: 94.2% on Berkeley Function Calling Benchmark — matches GPT-5 Turbo
  • Format: NVFP4 compressed checkpoint for efficient Blackwell/A100/H100 deployment
  • License: Apache 2.0 — full commercial use without attribution

API Performance

  • API Access: Mistral AI Studio, Azure Foundry, Amazon Bedrock, IBM watsonx, OpenRouter
  • Response Time: ~800ms-1.5s for standard generation
  • Pricing: Input 0.50/1Mtokens,Output0.50/1M tokens, Output 1.50/1M tokens (Azure); 4/4/12 on Mistral API
  • Tool Use: Native function calling with JSON schema — no prompt engineering needed
  • Fine-Tuning: Available for enterprise customers via Mistral platform

Key Features

  • Multimodal Understanding: Interprets images, diagrams, charts alongside text
  • Native Tool Use: First open-weight model with built-in function calling (94.2% success rate)
  • Consistent Behavior: Fewer breakdowns than peers in multi-turn conversations and complex inputs
  • Apache 2.0: Full commercial use, modification, and redistribution without restrictions
  • Enterprise Deployment: Available on Azure, AWS Bedrock, IBM watsonx for global reach
  • EU Data Sovereignty: Training and inference within EU borders — GDPR compliant by design

Pricing Breakdown

PlanPriceFeatures
Mistral API4/1Minput,4/1M input, 12/1M outputFull Large 3 access, function calling
Azure Foundry0.50/1Minput,0.50/1M input, 1.50/1M outputGlobal Standard, West US 3
AWS BedrockCustomManaged deployment, regional options
Self-HostedFree + infraApache 2.0 weights, your infrastructure

Privacy & Safety

  • Data Residency: All data stays within EU via Mistral API — critical for regulated industries
  • GDPR Compliance: Built for European regulatory requirements
  • Open Weights: Self-hosting option means zero data leaves your infrastructure
  • Fine-Tuning Privacy: Enterprise fine-tuning data isolated and not shared

The Killer Feature

Native agentic tool use + Apache 2.0 — Large 3 is the first open-weight model with truly native function calling. Define tools in JSON schema, and it reliably calls them with correct parameters, handles errors, and chains multiple tool calls. At 94.2% on the Berkeley benchmark, it matches GPT-5 Turbo. Combined with full Apache 2.0 licensing, you get enterprise-grade agent capabilities you can self-host, fine-tune, and modify without any licensing restrictions.

Pros & Cons

Pros:

  • Full Apache 2.0 — zero licensing restrictions
  • Native multimodal (text + image) understanding
  • 94.2% function calling success rate
  • Consistent behavior in multi-turn conversations
  • EU data sovereignty and GDPR compliance

Cons:

  • Dense MoE = higher inference cost than lighter models
  • 256k context (500k sliding) trails DeepSeek-V4''s 1M
  • Smaller ecosystem than OpenAI/Anthropic
  • Reasoning variant still forthcoming

Verdict

Mistral Large 3 is the most versatile open-weight model for enterprise use. Native tool use, multimodal understanding, and Apache 2.0 licensing make it ideal for building production agentic systems. At $0.50/1M input tokens on Azure, it''s remarkably affordable for a frontier-class model.