Overview

DeepSeek-V4 Preview launched on April 24, 2026, as two open-weight MoE checkpoints that share architecture and a one-million-token context window. V4-Pro (1.6T total / 49B active) rivals top closed-source models on reasoning and agentic coding. V4-Flash (284B total / 13B active) delivers comparable quality at ~1/7th the per-token cost. Both support three reasoning modes — non-thinking, high, and max — controlled via a single request parameter.

Architecture & Model Specs

V4-Pro: 1.6T total params, 49B active per token, 33T pre-training tokens
V4-Flash: 284B total params, 13B active per token, 32T pre-training tokens
Context Window: 1,000,000 tokens (standard across all V4 services)
Max Output: 384,000 tokens
Attention: Token-wise compression + DSA (DeepSeek Sparse Attention)
mHC: Manifold-Constrained Hyper-Connections preserve context integrity across 1M tokens
Thinking Modes: non-thinking, high, max — all accessible via a single parameter (unified endpoint)
License: MIT — fully permissive for commercial use
Hardware: Trained on Huawei Ascend processors; runs natively on local chips for AI sovereignty

API Performance

API Access: OpenAI-compatible and Anthropic-compatible endpoints; just update model name
Response Time: Flash ~400-800ms; Pro ~1-2s for standard generation
Pricing: Flash at ~$0.07/1M input tokens; Pro at competitive frontier-tier rates
Retirement Notice: deepseek-chat and deepseek-reasoner IDs retire July 24, 2026 — migrate to deepseek-v4-pro or deepseek-v4-flash
Integration: Native support in Claude Code, OpenClaw, and OpenCode agentic tools

Key Features

1M Context: Industry-leading long-context — process entire codebases, books, or legal documents in one shot
Agentic Coding SOTA: Open-source state-of-the-art on agentic coding benchmarks
Math/STEM/Coding: Leads all open models, trails only Gemini 3.1 Pro on knowledge benchmarks
Dual Modes: Switch between thinking (reasoning-heavy) and non-thinking (speed-focused) seamlessly
Self-Hostable: MIT weights + optimized inference runs on consumer hardware with quantization

Pricing Breakdown

Plan	Price	Features
Free	$0	V4-Flash (Instant Mode), limited generations/day
V4-Flash API	~$0.07/1M tokens	Input; ultra-low cost output pricing
V4-Pro API	Frontier-tier rate	Full Pro model access, 1M context
Self-Hosted	Free	MIT weights, your own infrastructure

Privacy & Safety

Data Usage: API requests not used for training by default
Self-Hosted: Complete data isolation — zero network calls
Content Policy: Chinese regulatory compliance built in
Open License: MIT license allows commercial use and modification

The Killer Feature

1 million token context at open-source pricing — no other model offers a million-token window with MIT-licensed weights. V4-Pro handles an entire codebase, all documentation, and a complex prompt in a single request. Combined with agentic coding capabilities that lead all open models, this is the most powerful self-hostable AI available. For enterprises that can't send data to OpenAI or Anthropic, DeepSeek-V4 is unmatched.

Pros & Cons

Pros:

1M token context is industry-leading
GPT-5.5-level reasoning at 1/10th the cost
MIT-licensed — fully open and self-hostable
Excellent Chinese-English bilingual support
Runs on Huawei Ascend (no Nvidia dependency)

Cons:

V4 is still in Preview (production hardening ongoing)
Weaker on non-Chinese/English languages
Self-hosting V4-Pro requires ~865 GB disk and significant VRAM
Safety alignment less robust than Western models

Verdict

DeepSeek-V4 is the most significant open-source model release of 2026. The 1M context window, MIT license, and frontier-level reasoning at low cost make it the default choice for any developer or enterprise that values control. V4-Flash is perfect for high-throughput, low-cost workloads; V4-Pro handles your most complex reasoning tasks.

DeepSeek-V4 - China's 1M-Context Open-Source Powerhouse

Tech Specs