DeepSeek

Executive Summary

What it is: DeepSeek is an API-only inference provider offering open-weight models (Apache 2.0 license) with Anthropic-compatible and OpenAI-compatible endpoints. DeepSeek-V4-Pro (1.6T total/49B active parameters, MoE) and V4-Flash (284B/13B active, MoE) support 1M context windows and 384K max output. V4-Flash is priced at $0.14/$0.28 per MTok, making it the cheapest frontier-capable model across all 17 suppliers in this report by a significant margin.

What to watch out for: V4-Pro's current pricing ($0.435/$0.87 per MTok) is a 75% promotional discount expiring May 31, 2026, after which it reverts to $1.74/$3.48 per MTok (still well below Western frontier pricing). Rate limits are entirely dynamic ("adjusted based on server load") with no published RPM/TPM, and DeepSeek has a history of outages during demand spikes (January 2025). Thinking mode token billing details are not documented.

Bottom line: DeepSeek V4-Flash at $0.14/$0.28 per MTok is, by a significant margin, the cheapest frontier-capable model available, and V4-Pro even at full price ($1.74/$3.48) undercuts every Western frontier model in this report. The tradeoff is reliability: no SLAs, no fixed rate limits, and a history of outages. Use it where cost savings justify availability risk, and pair it with Claude Code via the Anthropic-compatible endpoint for a production-ready agent.

Key Terms

  • DeepSeek-V4-Pro - DeepSeek's flagship model with 1.6T total parameters and 49B active parameters (MoE). Supports 1M context, thinking and non-thinking modes. Open-source. Currently at 75% promotional discount through May 31, 2026. Source: DeepSeek – Pricing
  • DeepSeek-V4-Flash - DeepSeek's fast, economical model with 284B total parameters and 13B active parameters (MoE). Supports 1M context, thinking and non-thinking modes. The successor to deepseek-chat and deepseek-reasoner. Source: DeepSeek – Pricing
  • Thinking mode - Extended reasoning capability where the model produces a chain of thought before answering. Both V4 models support thinking (default) and non-thinking modes. Source: DeepSeek – Thinking Mode
  • Context caching - DeepSeek offers automatic context caching. Cache hit price is 1/10 of the original launch price (reduced April 26, 2026). Cache hit for V4-Flash is $0.0028/MTok, V4-Pro is $0.003625/MTok (discounted). Source: DeepSeek – Pricing
  • Anthropic API compatibility - DeepSeek provides an Anthropic-compatible endpoint at api.deepseek.com/anthropic, allowing direct use with Claude Code and other Anthropic-based tools. Source: DeepSeek – Anthropic Api

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

  • New model: DeepSeek-V4-Pro and V4-Flash launched April 24. 1M context, 384K max output. Open-source (Apache 2.0).
  • Price change (promotional): V4-Pro at 75% discount ($0.435/$0.87 per MTok) until May 31, 2026. Reverts to $1.74/$3.48.
  • Price change: Cache hit prices reduced to 1/10 of launch prices for all models (April 26).
  • Deprecation (upcoming): deepseek-chat and deepseek-reasoner endpoints retired after July 24, 2026.
  • Feature added: Anthropic-compatible API endpoint for direct use with Claude Code.
  • Feature added: FIM completion beta (non-thinking mode only) and chat prefix completion beta.

Plans

DeepSeek is API-only with no consumer subscription plans. All usage is billed per token with no monthly minimum.

Pricing (Effective April 26, 2026)

ModelInput (Cache Miss)Input (Cache Hit)OutputContextMax Output
DeepSeek-V4-Pro$1.74/MTok ($0.435 promotional)$0.0145/MTok ($0.003625 promotional)$3.48/MTok ($0.87 promotional)1M tokens384K tokens
DeepSeek-V4-Flash$0.14/MTok$0.0028/MTok$0.28/MTok1M tokens384K tokens

V4-Pro promotional pricing: 75% discount extended until May 31, 2026 15:59 UTC. After this date, prices revert to $1.74 input / $3.48 output per MTok.

Cache hit reduction: Effective April 26, 2026 12:15 UTC, cache hit prices reduced to 1/10 of launch prices for all models.

Legacy model retirement: deepseek-chat (maps to V4-Flash non-thinking) and deepseek-reasoner (maps to V4-Flash thinking) will be fully retired and inaccessible after July 24, 2026 15:59 UTC.

Source: DeepSeek – Pricing

Cost Comparison at Promotional Pricing

At the current 75% discount, V4-Pro costs $0.435/$0.87 per MTok (input/output). This compares to:

  • Claude Opus 4.7: $5.00/$25.00 per MTok (11.5x / 28.7x more expensive)
  • Claude Sonnet 4.6: $3.00/$15.00 per MTok (6.9x / 17.2x more expensive)
  • GPT-5.4: undisclosed but estimated similar to Opus range

Even at full price ($1.74/$3.48), V4-Pro is significantly cheaper than Western frontier models.

V4-Flash at $0.14/$0.28 per MTok is the cheapest frontier-capable model available, cheaper than Gemini Flash and Claude Haiku.

Rate Limits

DeepSeek dynamically limits user concurrency based on server load. No fixed RPM/TPM limits are published. When the concurrency limit is reached, the API returns HTTP 429 immediately. Requests not starting inference within 10 minutes are terminated by the server.

Source: DeepSeek – Rate Limit

API Pricing

See Plans section above for complete pricing.

Additional Features

  • JSON Output: Supported on both models
  • Tool Calls: Supported on both models
  • FIM Completion (Fill-in-the-Middle): Beta, non-thinking mode only on both models
  • Chat Prefix Completion: Beta, supported on both models
  • Anthropic API format: Supported at https://api.deepseek.com/anthropic
  • OpenAI API format: Supported at https://api.deepseek.com
  • Agent Integrations: Claude Code, OpenClaw, OpenCode (documented guides)

Source: DeepSeek – Pricing

Model Performance / Benchmarks

DeepSeek claims "open-source SOTA on Agentic Coding benchmarks" and "rivaling top closed-source models" for V4-Pro, but does not publish detailed benchmark tables with exact scores. The full tech report is available as a PDF on Hugging Face.

V4-Pro: 1.6T total / 49B active parameters (MoE). Open-source.

V4-Flash: 284B total / 13B active parameters. Reasoning closely approaches V4-Pro.

Community comparison: V4-Flash at $0.14/$0.28 per MTok is the cheapest frontier-capable model available, cheaper than Gemini Flash and Claude Haiku.

Source: DeepSeek – News260424

Latest News

DeepSeek-V4 Launch (April 24, 2026)

DeepSeek released V4-Pro and V4-Flash, a major generational upgrade:

  • V4-Pro: 1.6T total / 49B active params (MoE). Open-source SOTA on Agentic Coding benchmarks. Claims to rival top closed-source models in math, STEM, and coding. Rich world knowledge, trailing only Gemini 3.1 Pro among open models.
  • V4-Flash: 284B total / 13B active params. Reasoning closely approaches V4-Pro. Performs on par with V4-Pro on simple Agent tasks. Faster and cheaper.
  • 1M context standard: Novel attention architecture combining token-wise compression with DeepSeek Sparse Attention (DSA). All official DeepSeek services default to 1M context.
  • Agent-first design: Seamlessly integrated with Claude Code, OpenClaw, and OpenCode. Already used for DeepSeek's internal agentic coding.
  • Max output 384K tokens: The largest max output of any frontier model, critical for agentic coding tasks.
  • Open weights available on Hugging Face.
  • HN: 2,086 points, 1,601 comments (one of the largest AI threads of 2026).

Source: DeepSeek – News260424

Cache Hit Price Reduction (April 26, 2026)

Cache hit prices reduced to 1/10 of launch prices for all models. V4-Flash cache hit now $0.0028/MTok, V4-Pro cache hit $0.003625/MTok (promotional) or $0.0145/MTok (regular). This is significant for agentic use cases where large context windows are reused across turns.

Source: DeepSeek – Pricing

V4-Pro Promotional Pricing Extended (April 2026)

The 75% discount on V4-Pro (bringing it to $0.435/$0.87 per MTok) was extended through May 31, 2026 15:59 UTC. After this, regular pricing ($1.74/$3.48) takes effect.

Source: DeepSeek – Pricing

Community Signals

V4 Launch Reception

HN: 2,086 points, 1,601 comments. The dominant community narrative was cost-effectiveness and the 1M context window. The thread became one of the largest AI discussions on HN in 2026.

Source: News – From

Cost-Effectiveness Narrative

From the GitHub Copilot report's community section: "DeepSeek API pricing 300% difference vs Anthropic/OpenAI." This price gap narrative has been consistent since DeepSeek's inception and continued with V4.

Source: News – Item

Claude Code Integration

DeepSeek provides documented integration guides for using V4 models with Claude Code via the Anthropic-compatible endpoint. The community has been using DeepSeek as a backend for Claude Code since V3, and V4's 1M context and 384K max output make it particularly attractive for this use case.

Source: DeepSeek – Claude Code

Historical Service Issues

DeepSeek experienced major outages in January 2025 due to "large-scale malicious attacks." The service has historically been less reliable than Western providers, with dynamic rate limiting that can result in unpredictable 429 errors during peak usage.

Source: News – From

Open Source Community

DeepSeek consistently open-sources its model weights (Apache 2.0 license for V4). The V4-Pro weights were released on Hugging Face simultaneously with the API launch. This has built significant goodwill in the open-source community.

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML)NoNot offered. DeepSeek is API-only with no managed platform.
SSO (OIDC)NoNot offered.
SCIMNoNot offered.
Audit logsNoNot offered.
IP indemnityNoNot offered.
Data residencyNoNot offered. API endpoints are China-based.
HIPAANoNot offered.
Air-gapped / on-premYesModels are open-weight (Apache 2.0). Can be self-hosted on your own GPU infrastructure for full data isolation.
SLANoNo SLA. History of outages during demand spikes.
Admin controls (RBAC)NoNo admin controls. API keys are per-account.

Transparency Gaps

GapDetailsSeverity
No fixed rate limitsRate limits are "dynamically adjusted based on server load" with no published RPM/TPM. Users cannot plan capacity. During high-demand periods (like the V4 launch), users may experience frequent 429 errors.High
Promotional pricing expiryV4-Pro is at 75% discount ($0.435/$0.87) until May 31, 2026. After that, prices jump to $1.74/$3.48, a 4x increase. Buyers evaluating V4-Pro today cannot rely on current pricing for long-term cost planning.High
Dynamic concurrency detailsNo documentation on what concurrency levels users can expect. The 10-minute timeout for queued requests is the only concrete number.Medium
Thinking mode token billingThe documentation does not clarify whether thinking tokens are billed at input or output rates, or how to estimate thinking token costs for agentic workloads.Medium
V4-Pro benchmark specificsClaims "open-source SOTA on Agentic Coding benchmarks" and "rivaling top closed-source models" without publishing detailed benchmark tables with exact scores. The full tech report is a PDF on Hugging Face.Low
FIM mode limitationsFIM completion is in beta and only available in non-thinking mode. The documentation does not explain why or when thinking mode support might arrive.Low