Key Terms
- DeepSeek-V4-Pro - DeepSeek's flagship model with 1.6T total parameters and 49B active parameters (MoE). Supports 1M context, thinking and non-thinking modes. Open-source. Currently at 75% promotional discount through May 31, 2026. Source: DeepSeek – Pricing
- DeepSeek-V4-Flash - DeepSeek's fast, economical model with 284B total parameters and 13B active parameters (MoE). Supports 1M context, thinking and non-thinking modes. The successor to deepseek-chat and deepseek-reasoner. Source: DeepSeek – Pricing
- Thinking mode - Extended reasoning capability where the model produces a chain of thought before answering. Both V4 models support thinking (default) and non-thinking modes. Source: DeepSeek – Thinking Mode
- Context caching - DeepSeek offers automatic context caching. Cache hit price is 1/10 of the original launch price (reduced April 26, 2026). Cache hit for V4-Flash is $0.0028/MTok, V4-Pro is $0.003625/MTok (discounted). Source: DeepSeek – Pricing
- Anthropic API compatibility - DeepSeek provides an Anthropic-compatible endpoint at api.deepseek.com/anthropic, allowing direct use with Claude Code and other Anthropic-based tools. Source: DeepSeek – Anthropic Api
Latest Changes
First report for this supplier. All models, plans, and pricing are listed as current state.
- New model: DeepSeek-V4-Pro and V4-Flash launched April 24. 1M context, 384K max output. Open-source (Apache 2.0).
- Price change (promotional): V4-Pro at 75% discount ($0.435/$0.87 per MTok) until May 31, 2026. Reverts to $1.74/$3.48.
- Price change: Cache hit prices reduced to 1/10 of launch prices for all models (April 26).
- Deprecation (upcoming): deepseek-chat and deepseek-reasoner endpoints retired after July 24, 2026.
- Feature added: Anthropic-compatible API endpoint for direct use with Claude Code.
- Feature added: FIM completion beta (non-thinking mode only) and chat prefix completion beta.
Plans
DeepSeek is API-only with no consumer subscription plans. All usage is billed per token with no monthly minimum.
Pricing (Effective April 26, 2026)
| Model | Input (Cache Miss) | Input (Cache Hit) | Output | Context | Max Output |
|---|---|---|---|---|---|
| DeepSeek-V4-Pro | $1.74/MTok ($0.435 promotional) | $0.0145/MTok ($0.003625 promotional) | $3.48/MTok ($0.87 promotional) | 1M tokens | 384K tokens |
| DeepSeek-V4-Flash | $0.14/MTok | $0.0028/MTok | $0.28/MTok | 1M tokens | 384K tokens |
V4-Pro promotional pricing: 75% discount extended until May 31, 2026 15:59 UTC. After this date, prices revert to $1.74 input / $3.48 output per MTok.
Cache hit reduction: Effective April 26, 2026 12:15 UTC, cache hit prices reduced to 1/10 of launch prices for all models.
Legacy model retirement: deepseek-chat (maps to V4-Flash non-thinking) and deepseek-reasoner (maps to V4-Flash thinking) will be fully retired and inaccessible after July 24, 2026 15:59 UTC.
Source: DeepSeek – Pricing
Cost Comparison at Promotional Pricing
At the current 75% discount, V4-Pro costs $0.435/$0.87 per MTok (input/output). This compares to:
- Claude Opus 4.7: $5.00/$25.00 per MTok (11.5x / 28.7x more expensive)
- Claude Sonnet 4.6: $3.00/$15.00 per MTok (6.9x / 17.2x more expensive)
- GPT-5.4: undisclosed but estimated similar to Opus range
Even at full price ($1.74/$3.48), V4-Pro is significantly cheaper than Western frontier models.
V4-Flash at $0.14/$0.28 per MTok is the cheapest frontier-capable model available, cheaper than Gemini Flash and Claude Haiku.
Rate Limits
DeepSeek dynamically limits user concurrency based on server load. No fixed RPM/TPM limits are published. When the concurrency limit is reached, the API returns HTTP 429 immediately. Requests not starting inference within 10 minutes are terminated by the server.
Source: DeepSeek – Rate Limit
API Pricing
See Plans section above for complete pricing.
Additional Features
- JSON Output: Supported on both models
- Tool Calls: Supported on both models
- FIM Completion (Fill-in-the-Middle): Beta, non-thinking mode only on both models
- Chat Prefix Completion: Beta, supported on both models
- Anthropic API format: Supported at
https://api.deepseek.com/anthropic - OpenAI API format: Supported at
https://api.deepseek.com - Agent Integrations: Claude Code, OpenClaw, OpenCode (documented guides)
Source: DeepSeek – Pricing
Model Performance / Benchmarks
DeepSeek claims "open-source SOTA on Agentic Coding benchmarks" and "rivaling top closed-source models" for V4-Pro, but does not publish detailed benchmark tables with exact scores. The full tech report is available as a PDF on Hugging Face.
V4-Pro: 1.6T total / 49B active parameters (MoE). Open-source.
V4-Flash: 284B total / 13B active parameters. Reasoning closely approaches V4-Pro.
Community comparison: V4-Flash at $0.14/$0.28 per MTok is the cheapest frontier-capable model available, cheaper than Gemini Flash and Claude Haiku.
Source: DeepSeek – News260424
Latest News
DeepSeek-V4 Launch (April 24, 2026)
DeepSeek released V4-Pro and V4-Flash, a major generational upgrade:
- V4-Pro: 1.6T total / 49B active params (MoE). Open-source SOTA on Agentic Coding benchmarks. Claims to rival top closed-source models in math, STEM, and coding. Rich world knowledge, trailing only Gemini 3.1 Pro among open models.
- V4-Flash: 284B total / 13B active params. Reasoning closely approaches V4-Pro. Performs on par with V4-Pro on simple Agent tasks. Faster and cheaper.
- 1M context standard: Novel attention architecture combining token-wise compression with DeepSeek Sparse Attention (DSA). All official DeepSeek services default to 1M context.
- Agent-first design: Seamlessly integrated with Claude Code, OpenClaw, and OpenCode. Already used for DeepSeek's internal agentic coding.
- Max output 384K tokens: The largest max output of any frontier model, critical for agentic coding tasks.
- Open weights available on Hugging Face.
- HN: 2,086 points, 1,601 comments (one of the largest AI threads of 2026).
Source: DeepSeek – News260424
Cache Hit Price Reduction (April 26, 2026)
Cache hit prices reduced to 1/10 of launch prices for all models. V4-Flash cache hit now $0.0028/MTok, V4-Pro cache hit $0.003625/MTok (promotional) or $0.0145/MTok (regular). This is significant for agentic use cases where large context windows are reused across turns.
Source: DeepSeek – Pricing
V4-Pro Promotional Pricing Extended (April 2026)
The 75% discount on V4-Pro (bringing it to $0.435/$0.87 per MTok) was extended through May 31, 2026 15:59 UTC. After this, regular pricing ($1.74/$3.48) takes effect.
Source: DeepSeek – Pricing
Community Signals
V4 Launch Reception
HN: 2,086 points, 1,601 comments. The dominant community narrative was cost-effectiveness and the 1M context window. The thread became one of the largest AI discussions on HN in 2026.
Source: News – From
Cost-Effectiveness Narrative
From the GitHub Copilot report's community section: "DeepSeek API pricing 300% difference vs Anthropic/OpenAI." This price gap narrative has been consistent since DeepSeek's inception and continued with V4.
Source: News – Item
Claude Code Integration
DeepSeek provides documented integration guides for using V4 models with Claude Code via the Anthropic-compatible endpoint. The community has been using DeepSeek as a backend for Claude Code since V3, and V4's 1M context and 384K max output make it particularly attractive for this use case.
Source: DeepSeek – Claude Code
Historical Service Issues
DeepSeek experienced major outages in January 2025 due to "large-scale malicious attacks." The service has historically been less reliable than Western providers, with dynamic rate limiting that can result in unpredictable 429 errors during peak usage.
Source: News – From
Open Source Community
DeepSeek consistently open-sources its model weights (Apache 2.0 license for V4). The V4-Pro weights were released on Hugging Face simultaneously with the API launch. This has built significant goodwill in the open-source community.
Enterprise Readiness
| Feature | Available? | Details |
|---|---|---|
| SSO (SAML) | No | Not offered. DeepSeek is API-only with no managed platform. |
| SSO (OIDC) | No | Not offered. |
| SCIM | No | Not offered. |
| Audit logs | No | Not offered. |
| IP indemnity | No | Not offered. |
| Data residency | No | Not offered. API endpoints are China-based. |
| HIPAA | No | Not offered. |
| Air-gapped / on-prem | Yes | Models are open-weight (Apache 2.0). Can be self-hosted on your own GPU infrastructure for full data isolation. |
| SLA | No | No SLA. History of outages during demand spikes. |
| Admin controls (RBAC) | No | No admin controls. API keys are per-account. |
Transparency Gaps
| Gap | Details | Severity |
|---|---|---|
| No fixed rate limits | Rate limits are "dynamically adjusted based on server load" with no published RPM/TPM. Users cannot plan capacity. During high-demand periods (like the V4 launch), users may experience frequent 429 errors. | High |
| Promotional pricing expiry | V4-Pro is at 75% discount ($0.435/$0.87) until May 31, 2026. After that, prices jump to $1.74/$3.48, a 4x increase. Buyers evaluating V4-Pro today cannot rely on current pricing for long-term cost planning. | High |
| Dynamic concurrency details | No documentation on what concurrency levels users can expect. The 10-minute timeout for queued requests is the only concrete number. | Medium |
| Thinking mode token billing | The documentation does not clarify whether thinking tokens are billed at input or output rates, or how to estimate thinking token costs for agentic workloads. | Medium |
| V4-Pro benchmark specifics | Claims "open-source SOTA on Agentic Coding benchmarks" and "rivaling top closed-source models" without publishing detailed benchmark tables with exact scores. The full tech report is a PDF on Hugging Face. | Low |
| FIM mode limitations | FIM completion is in beta and only available in non-thinking mode. The documentation does not explain why or when thinking mode support might arrive. | Low |