DeepSeek

Executive Summary

What it is: DeepSeek is an API-only LLM provider based in Hangzhou, China, offering two open-weight models (DeepSeek-V4-Pro and DeepSeek-V4-Flash) with a 1M-token context window. It is not a coding agent itself; its models are used through third-party harnesses like OpenCode, Claude Code, and Pi. Pricing is pay-per-token with no subscription plans, ranging from $0.14/$0.28 per MTok (Flash) to $0.435/$0.87 per MTok (Pro, permanent discount). Source: https://api-docs.deepseek.com/quick_start/pricing

What to watch out for: DeepSeek does not offer an opt-out for training on API data, has no enterprise features (SSO, SCIM, audit logs, IP indemnity, SLA), and its data processing is subject to China's National Intelligence Law. The deepseek-chat and deepseek-reasoner model aliases will be fully retired on 2026-07-24. The permanent 75% discount on V4-Pro is aggressive pricing that may not be sustainable long-term.

Bottom line: DeepSeek V4-Pro offers frontier-level coding performance at roughly one-quarter the price of comparable Western models. For non-sensitive workloads, it is the most cost-effective API option available. For enterprises with data residency, compliance, or IP protection requirements, it is not viable as a primary provider.

Key Terms

  • DSA (DeepSeek Sparse Attention) — a novel attention mechanism in V4 that uses token-wise compression to handle 1M-token contexts without linearly scaling compute. It is the architectural innovation behind V4's cost efficiency. Source: Hugging Face – Deepseek V4.Pdf
  • MLA (Multi-Head Latent Attention) — DeepSeek's compression technique that reduces KV cache size drastically compared to standard attention, enabling cheaper inference and larger batch sizes. Source: DeepSeek – News260424
  • Thinking / Non-Thinking mode — V4 models support dual modes. Thinking mode (chain-of-thought) produces extended reasoning before the answer; non-thinking mode provides direct responses. Both modes are available via a single model name. Source: DeepSeek – Thinking Mode
  • Anthropic-compatible API — DeepSeek exposes an endpoint at api.deepseek.com/anthropic that accepts Anthropic-format requests, allowing direct use with Claude Code and other Anthropic-SDK-based tools. Source: DeepSeek – Anthropic Api
  • Context Caching (KV Cache) — DeepSeek caches previously processed tokens on disk, reducing cost for repeated prompts. Cache hits are billed at 1/50th the cache-miss rate for V4-Pro. Source: DeepSeek – Kv Cache

Latest Changes

Changes since the 2026-04 report.

  • New model: DeepSeek-V4-Pro (1.6T total / 49B active params) launched 2026-04-24. Open-source SOTA on Agentic Coding benchmarks. $0.435/$0.87 per MTok input/output at permanent discount pricing. See API Pricing.
  • New model: DeepSeek-V4-Flash (284B total / 13B active params) launched 2026-04-24. Fast, efficient model at $0.14/$0.28 per MTok. See API Pricing.
  • Price change: DeepSeek-V4-Pro launched with a 75% promotional discount. On 2026-05-22, DeepSeek announced the discount is now permanent. Post-discount pricing: $0.435/$0.87 per MTok (input cache miss / output). Source: X – Status
  • Price change: Input cache hit prices reduced to 1/10 of launch price across all models, effective 2026-04-26. Source: DeepSeek – Pricing
  • Deprecation: deepseek-chat and deepseek-reasoner model names will be fully retired on 2026-07-24 15:59 UTC. Currently they route to V4-Flash non-thinking and thinking modes respectively. Source: DeepSeek – Updates
  • Feature added: Anthropic-compatible API endpoint at api.deepseek.com/anthropic, enabling direct integration with Claude Code. Source: DeepSeek – Anthropic Api
  • Feature added: Both V4 models support 1M-token context window and 384K max output. Source: DeepSeek – Pricing

Plans

DeepSeek is an API-only provider. There are no subscription plans, tiers, or bundled coding agents. Users top up a balance and are billed per token consumed.

PlanPriceIncluded UsageKey Details
API (pay-per-token)Pay as you goNo limits; concurrency limits apply (see below)Balance must be topped up manually; no monthly commitment

Concurrency limits:

ModelConcurrency Limit
deepseek-v4-pro500
deepseek-v4-flash2,500

Higher concurrency is available at no additional cost via a capacity expansion request. Source: DeepSeek – Rate Limit

Terms explained:

  • Concurrency limit — the number of simultaneous requests an account can have in-flight. Exceeding it returns HTTP 429. This is not a rate limit per minute or per day. Source: DeepSeek – Rate Limit

API Pricing

All prices per 1M tokens.

ModelInput (Cache Hit)Input (Cache Miss)OutputContextMax Output
deepseek-v4-flash$0.0028$0.14$0.281M384K
deepseek-v4-pro$0.003625$0.435$0.871M384K
deepseek-v4-pro (pre-discount, shown as strikethrough on pricing page)$0.0145$1.74$3.481M384K

Notes:

  • V4-Pro prices reflect the permanent 75% discount announced 2026-05-22. The original prices ($1.74/$3.48) are shown with strikethrough on the pricing page.
  • Cache hit prices were reduced to 1/10 of the launch price effective 2026-04-26.
  • Both models support thinking and non-thinking modes at the same price. Thinking mode may consume more tokens.
  • FIM completion is available in non-thinking mode only for both models.
  • Deduction: expense = tokens x price. Fees deducted from topped-up or granted balance, with granted balance consumed first.

Source: DeepSeek – Pricing

Model Performance / Benchmarks

DeepSeek reports the following in their V4 launch announcement. Independent verification from NIST CAISI evaluation is also noted.

ModelBenchmarkScoreNotes
DeepSeek-V4-ProAgentic Coding (open-source SOTA)Undisclosed exact scoreBeats all open-source models; rivals top closed-source models
DeepSeek-V4-ProWorld KnowledgeUndisclosed exact scoreLeads all open models; trails only Gemini-3.1-Pro among closed-source
DeepSeek-V4-ProMath / STEM / CodingUndisclosed exact scoreBeats all open models; rivaling top closed-source
DeepSeek-V4-FlashReasoningClose to V4-ProSmaller parameter size, faster response
DeepSeek-V4-ProNIST CAISI Evaluation"On par with GPT-5"Per NIST evaluation. Source: Nist – Caisi Evaluation Deepseek V4 Pro

DeepSeek's launch page includes benchmark images but does not publish exact numerical scores in text form. The full tech report is at: Hugging Face – Deepseek V4.Pdf

Community assessment places V4-Pro quality "close to Opus 4.5" for coding tasks, below Opus 4.6 with thinking enabled. Source: News – Item

Latest News

  1. 2026-04-24: DeepSeek-V4 Preview Release. V4-Pro (1.6T/49B active) and V4-Flash (284T/13B active) launched as open-weight models. 1M context, dual thinking/non-thinking modes, both OpenAI and Anthropic API compatible. Trained on Huawei Ascend 950 and Cambricon hardware. Source: DeepSeek – News260424
  1. 2026-04-26: Cache hit price reduction. Input cache hit prices reduced to 1/10 of launch price for all models. Source: DeepSeek – Pricing
  1. 2026-04-27: 75% promotional discount on V4-Pro. Initially set to expire 2026-05-05. The discount brought V4-Pro from $1.74/$3.48 to $0.435/$0.87 per MTok. Source: Thenextweb – Deepseek V4 Pro Price Cut 75 Percent
  1. 2026-05-05: Discount extended to 2026-05-31. The promotional end date was pushed from May 5 to May 31 15:59 UTC. Source: DeepSeek – Pricing
  1. 2026-05-22: 75% discount made permanent. DeepSeek's official X account announced that V4-Pro pricing will remain at 1/4 of the original price after the promotion ends on 2026-05-31 15:59 UTC. The pricing page was updated to note: "The deepseek-v4-pro model API pricing will be officially adjusted to 1/4 of the original price after the 75% discount promotion ends on 2026/05/31 15:59 UTC." Source: X – Status ; DeepSeek – Pricing
  1. 2026-05-03: NIST CAISI evaluation. The US National Institute of Standards and Technology's CAISI evaluation found DeepSeek V4 Pro to be "on par with GPT-5." Source: Nist – Caisi Evaluation Deepseek V4 Pro

Community Signals

Permanent pricing confirmed (HN, May 22). The story "DeepSeek makes the V4 Pro price discount permanent" (620 points, 549 comments) dominated HN. The top-line reaction: the post-discount V4-Pro pricing of $0.435/$0.87 per MTok is "less than half of Haiku" (per user jimmydoe on HN), making it the cheapest frontier-tier API by a wide margin. Community debate centered on whether this is sustainable or subsidized. User hedora argued that DeepSeek's architecture (MLA + DSA) is genuinely more cost-efficient, not just subsidized: "the companies focused on price performance will end up with more economic resources." Others flagged the geopolitical angle, with multiple users noting China's infrastructure advantages in power cost and chip independence. Source: News – Item

DeepClaude: using DeepSeek V4 Pro with Claude Code (HN, May 3). The project "DeepClaude" (678 points, 281 comments) showed developers how to point Claude Code at DeepSeek's Anthropic-compatible endpoint via environment variables. Multiple commenters reported that V4-Pro "feels close to Opus 4.5" for coding tasks, with user 63stack saying "It's close to Opus 4.5 for me." User rapind reported running it for a week on non-confidential projects: "I honestly can't tell the difference." A key concern raised: DeepSeek's API does not let you opt out of training. Users recommended OpenRouter with ZDR enabled or third-party providers like DeepInfra as alternatives. Source: News – Item

V4-Pro 75% off thread (HN, May 6). The "DeepSeek V4 Pro at 75% off until 31 May" thread (87 points, 86 comments) focused on the economics. User samdhar ran the self-hosting math: "at this rate you'd need to push roughly 1B output tokens/day to break even against an 8xH100 fleet." User EEnsw3r noted that "the cache-hit rate is extremely high, and the cache invalidation window is much longer than every other provider's," suggesting genuine efficiency gains rather than pure subsidy. Privacy concerns were a recurring theme, with multiple users saying they would only use DeepSeek for non-sensitive projects. Source: News – Item

V4-Flash praised as exceptional value. Multiple community members called V4-Flash a "sleeper pick." User speu on HN: "I've been trying deepseek-v4-flash and I'm blown away. It's no Opus, but it's as cheap as tap water and punches far above its weight as a Flash model." The Pro model was recommended for complex agent tasks; Flash for simpler sub-agent work. Source: News – Item

Harness integration. Community members reported success using DeepSeek V4 with OpenCode, Pi, Claude Code (via Anthropic endpoint), and KiloCode. OpenCode was called "a 1:1 replacement for Claude Code" by one user. Claude Code integration works because DeepSeek exposes an Anthropic-compatible API. Source: News – Item

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML/OIDC)NoNot mentioned in any documentation. API uses key-based auth only.
SCIMNoNo user provisioning features.
Audit logsNoNo audit logging mentioned.
IP indemnityNoNo IP indemnity or copyright commitment.
Data residencyNoData processed in China. No regional processing options. Subject to China's National Intelligence Law.
HIPAANoNo HIPAA compliance mentioned.
Air-gapped / On-premPartialModels are open-weight; self-hosting is possible but requires significant GPU resources (1.6T params for V4-Pro).
SLANoNo SLA documented.
Admin controls (RBAC)NoNo admin controls. Per-account API keys only. user_id parameter provides basic multi-user isolation.

Source: DeepSeek – Rate Limit ; DeepSeek

Transparency Gaps

  1. No opt-out from training on API data. DeepSeek's API terms do not provide a mechanism to opt out of training on submitted data. This was confirmed by multiple HN commenters. OpenRouter with ZDR or third-party providers are workarounds, not DeepSeek's own feature.
  2. Exact benchmark scores not published in text. V4-Pro's launch page shows benchmark results only in images, not as machine-readable numbers. The tech report (PDF on HuggingFace) contains details, but the marketing page deliberately avoids extractable scores.
  3. Sustainability of V4-Pro pricing is undisclosed. The 75% discount was initially promotional (expiring May 5, then May 31). Now permanent, but DeepSeek has not disclosed whether these prices are above or below inference cost. Community speculation is divided.
  4. No published rate limits beyond concurrency. There are no documented RPM (requests per minute) or TPM (tokens per minute) limits. Only concurrency limits (500 for Pro, 2,500 for Flash) are listed.
  5. Data retention policy is vague. DeepSeek's privacy policy references "Parties with Other Legal Rights" as a category of data recipients, which community members have flagged as concerning given China's National Intelligence Law. Source: Cdn – Deepseek Privacy Policy
  6. Model deprecation timeline is aggressive. deepseek-chat and deepseek-reasoner will be retired on 2026-07-24, just 3 months after V4 launch. Users who have not migrated model names will experience breakage.