Key Terms
- Bailian (Model Studio) — Alibaba Cloud's managed LLM platform (help.aliyun.com/zh/model-studio). Provides API access, token billing, and subscription plans for Qwen and third-party models.
- Coding Plan — A flat-rate monthly subscription (¥200/month Pro tier) that provides request-based access to coding models through CLI tools. Billed per request, not per token. Not available for API/automated use. Source: Aliyun – Coding Plan
- Token Plan (Team Edition) — A per-seat monthly subscription with Credits-based billing. Three tiers: Standard (¥198/seat/month, 25K Credits), Advanced (¥698, 100K Credits), Premium (¥1,398, 250K Credits). Credits are consumed based on model, tokens, and thinking mode. Source: Aliyun – Token Plan Overview
- Credits — Token Plan's billing unit. Consumption depends on model type, token count, thinking mode, and tool calls. For example, a qwen3.6-plus request with ~8K input tokens, ~41K cached tokens, and ~573 output tokens consumes approximately 3.18 Credits.
- Context caching — Reusing previously cached input tokens at a discount. Supported on qwen3.7-max, qwen3.6-max-preview, and several other models. Reduces input token cost. Source: Aliyun – Context Cache
- Batch API — Asynchronous inference at 50% of real-time pricing. Supported on qwen3.7-max, qwen3.6-plus, and other models. Source: Aliyun – Batch Inference
- Tiered pricing — Input token cost increases with larger context windows. For example, qwen3.6-plus costs ¥2/MTok for inputs up to 256K tokens, rising to ¥8/MTok for inputs between 256K and 1M tokens.
Latest Changes
Changes since the 2026-04 report.
- New model: qwen3.7-max launched 2026-05-21 as the new flagship. 1M context, 64K max output, 256K thinking budget. Pricing: ¥12/MTok input, ¥36/MTok output. Thinking mode enabled by default. Supports batch (50%) and context caching. Replaces qwen3.6-max-preview as the top-tier model. Source: Aliyun – Newly Released Models
- New model: qwen3.7-max-preview launched 2026-05-25. Same specs as qwen3.7-max but thinking-only mode. Snapshot: qwen3.7-max-2026-05-17.
- New model snapshot: qwen3.7-max-2026-05-20 — pinned snapshot of qwen3.7-max for reproducibility.
- New third-party model: xiaomi/mimo-v2.5-pro added 2026-05-19, 1M context, available via API.
- New third-party model: ZHIPU/GLM-5.1 and GLM-5 added 2026-05-19. GLM-5.1 has 198K context.
- New third-party model: stepfun/step-3.7-flash added 2026-05-29.
- New third-party model: vanchin/deepseek-v4-pro added 2026-05-29 (Kuaishou/Vanchin-hosted DeepSeek).
- Token Plan promo: qwen3.7-max Credits consumption halved until 2026-06-22. Also supports implicit caching on Token Plan. Source: Aliyun – Token Plan Overview
- HLD update needed: HLD.md lists "Qwen3.6-Max-Preview" as the latest LLM but the current flagship is now qwen3.7-max. The Latest LLM column should be updated.
Plans
| Plan | Price | Billing | Usage Limits | Data Used for Training? | Key Models |
|---|---|---|---|---|---|
| Pay-as-you-go | Per-token | CNY per MTok | Undisclosed rate limits (see rate-limit page) | No | All Qwen + third-party models |
| Coding Plan Pro | ¥200/month | Per-request | 6,000 req/5hr, 45,000/week, 90,000/month | Yes (explicitly stated in terms) | qwen3.6-plus, qwen3.5-plus, qwen3-coder-next, qwen3-coder-plus, kimi-k2.5, glm-5, glm-4.7, MiniMax-M2.5, qwen3-max-2026-01-23 |
| Coding Plan Lite | Discontinued (March 2026) | Per-request | N/A | N/A | N/A |
| Token Plan Standard | ¥198/seat/month | Credits (25,000/seat/month) | Undisclosed | No (explicitly promised) | qwen3.7-max, qwen3.6-plus, qwen3.6-flash, deepseek-v4-pro/flash/v3.2, kimi-k2.6/k2.5, glm-5.1/5, MiniMax-M2.5, qwen-image-2.0/pro, wan2.7-image/pro |
| Token Plan Advanced | ¥698/seat/month | Credits (100,000/seat/month) | Undisclosed | No | Same as Standard |
| Token Plan Premium | ¥1,398/seat/month | Credits (250,000/seat/month) | Undisclosed | No | Same as Standard |
| Token Plan Shared Pack | ¥5,000/pack | Credits (625,000/pack) | 1-month expiry | No | Same as Standard |
| Free tier | Free | 1M input + 1M output tokens | One-time, 90-day expiry | No | Most Qwen models |
Terms explained:
- Credits — Token Plan's billing unit. A single qwen3.6-plus request with ~8K input tokens consumes roughly 3.18 Credits. Actual consumption varies by model, thinking mode, and tool usage. Source: Aliyun – Token Plan Overview
- Data used for training — Coding Plan explicitly states that model inputs and outputs will be used for service improvement and model optimization during the subscription period. Token Plan explicitly states it does not use conversation data for model training. This distinction is critical for enterprise buyers.
API Pricing
All prices in CNY (Chinese Yuan) per million tokens, China mainland region (Beijing). Approximate USD equivalent at ~$1 = ¥7.2.
Qwen3.7 Max (flagship, launched 2026-05-21)
| Model | Mode | Input Tier | Input (¥/MTok) | Output (¥/MTok) |
|---|---|---|---|---|
| qwen3.7-max | Non-thinking + Thinking | 0-1M | 12 | 36 |
| qwen3.7-max-2026-05-20 | Non-thinking + Thinking | 0-1M | 12 | 36 |
| qwen3.7-max-preview | Thinking only | 0-1M | 12 | 36 |
| qwen3.7-max-2026-05-17 | Thinking only | 0-1M | 12 | 36 |
Features: Batch API (50% discount), context caching (discount on cached input tokens). Free tier: 1M tokens each input/output, 90-day expiry.
Qwen3.6 Max Preview
| Model | Mode | Input Tier | Input (¥/MTok) | Output (¥/MTok) |
|---|---|---|---|---|
| qwen3.6-max-preview | Non-thinking + Thinking | 0-128K | 9 | 54 |
| qwen3.6-max-preview | Non-thinking + Thinking | 128K-256K | 15 | 90 |
Features: Context caching supported. 256K context window.
Qwen3.6 Plus
| Model | Mode | Input Tier | Input (¥/MTok) | Output: Non-thinking (¥/MTok) | Output: Thinking (¥/MTok) |
|---|---|---|---|---|---|
| qwen3.6-plus | Both | 0-256K | 2 | 12 | 12 |
| qwen3.6-plus | Both | 256K-1M | 8 | 48 | 48 |
| qwen3.6-plus-2026-04-02 | Both | 0-256K | 2 | 12 | 12 |
| qwen3.6-plus-2026-04-02 | Both | 256K-1M | 8 | 48 | 48 |
Features: Batch API (50%), 1M context window. Supports Token Plan and Coding Plan. Free tier: 1M tokens each.
Qwen3.6 Flash
| Model | Mode | Input Tier | Input (¥/MTok) | Output (¥/MTok) |
|---|---|---|---|---|
| qwen3.6-flash | Non-thinking + Thinking | 0-256K | 1.2 | 7.2 |
| qwen3.6-flash | Non-thinking + Thinking | 256K-1M | 4.8 | 28.8 |
| qwen3.6-flash-2026-04-16 | Non-thinking + Thinking | 0-256K | 1.2 | 7.2 |
| qwen3.6-flash-2026-04-16 | Non-thinking + Thinking | 256K-1M | 4.8 | 28.8 |
Features: Batch API (50%), context caching. 1M context window. Supports Token Plan. Free tier: 1M tokens each.
Approximate USD Equivalent (at $1 = ¥7.2)
| Model | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| qwen3.7-max | ~$1.67 | ~$5.00 |
| qwen3.6-max-preview (≤128K) | ~$1.25 | ~$7.50 |
| qwen3.6-plus (≤256K) | ~$0.28 | ~$1.67 |
| qwen3.6-flash (≤256K) | ~$0.17 | ~$1.00 |
Model Performance / Benchmarks
Qwen3-Coder-480B-A35B-Instruct (open-source, released July 2025) benchmarks from the official blog post:
| Benchmark | Score | Comparison |
|---|---|---|
| SWE-Bench Verified | State-of-the-art among open models | Comparable to Claude Sonnet 4 (from July 2025) |
| Agentic Coding (multi-benchmark) | State-of-the-art among open models | Comparable to Claude Sonnet 4 |
| Agentic Browser-Use | State-of-the-art among open models | Comparable to Claude Sonnet 4 |
| Agentic Tool-Use | State-of-the-art among open models | Comparable to Claude Sonnet 4 |
No published coding benchmarks for the closed-source qwen3.7-max, qwen3.6-plus, or qwen3.6-flash models as of May 2026. The model release notes describe qwen3.7-max as excelling in "programming, office productivity, and long-horizon autonomous execution" but do not provide specific benchmark scores. Source: Aliyun – Newly Released Models
Latest News
qwen3.7-max launched as new flagship (2026-05-21). Alibaba released qwen3.7-max, the newest model in the Qwen Max series, with a 1M token context window, 64K max output, and 256K thinking budget. Thinking mode is enabled by default. The model is positioned as superior in programming, office productivity, and long-horizon autonomous tasks. Priced at ¥12/MTok input and ¥36/MTok output, it is 33% more expensive at input than the previous qwen3.6-max-preview (¥9/MTok at ≤128K tokens) but offers a much larger 1M context window (vs. 256K). Supports batch API at 50% discount and context caching. Source: Aliyun – Newly Released Models
qwen3.7-max-preview released (2026-05-25). A preview variant of qwen3.7-max with thinking-only mode, same pricing. Snapshot: qwen3.7-max-2026-05-17. Source: Aliyun – Newly Released Models
Token Plan promo: qwen3.7-max at half Credits consumption (until 2026-06-22). During the promotional period, qwen3.7-max on Token Plan consumes Credits at 50% of normal rate, and also supports implicit caching. This makes the flagship model effectively affordable on the Token Plan's seat-based billing. Source: Aliyun – Token Plan Overview
Multiple third-party models added (May 2026). xiaomi/mimo-v2.5-pro (May 19), ZHIPU/GLM-5.1 and GLM-5 (May 19), stepfun/step-3.7-flash (May 29), and vanchin/deepseek-v4-pro (May 29, Kuaishou-hosted). These expand the model marketplace on Bailian but are not Qwen models. Source: Aliyun – Newly Released Models
Qwen Code CLI now available via Alibaba Cloud Model Studio. Qwen Code is a CLI coding agent (forked from Gemini CLI) with a VS Code plugin ("Qwen Code Companion"). It supports three billing modes: Token Plan, Coding Plan, and pay-as-you-go. Compatible with OpenClaw, Hermes Agent, Claude Code, OpenCode, Cursor, Codex, Cline, Qoder, Kilo CLI, and other tools. Source: Aliyun – Qwen Code
Qwen blog migrated to qwen.ai. The old blog at qwenlm.github.io/blog/ now redirects to qwen.ai/research. The last substantive post on the old blog was "Qwen3Guard" (September 23, 2025). No May 2026 blog posts were found at the new location (qwen.ai appears to be heavily JS-rendered and content was not accessible via standard fetch or browser snapshot).
Community Signals
Qwen3.6-35B-A3B open-source release generated massive interest. A HackerNews post titled "Qwen3.6-35B-A3B: Agentic coding power, now open to all" received 1,274 upvotes on April 16, 2026, indicating strong developer interest in locally-runnable Qwen coding models. The model is a 35B-parameter MoE with only 3B active parameters, making it practical for consumer hardware. Source: Hn – Search
Active local coding agent experiments with Qwen 3.6. On Reddit r/LocalLLaMA, a post "Running Qwen 3.6 35b MoE With Zoo Code On M1 Max is Amazing!" (8 upvotes, 14 comments) described successful local coding agent usage on Apple Silicon. Another thread "Qwen 3.6 coding choice: 27B vs 35B quants" (6 upvotes, 84 comments) showed active community debate about optimal quantization for coding tasks. Source: Reddit – Search
NVIDIA quantized release of Qwen3.6-35B drew significant attention. "nvidia/Qwen3.6-35B-A3B-NVFP4" on Hugging Face was shared on r/LocalLLaMA and received 198 upvotes and 36 comments, indicating strong demand for optimized inference of Qwen coding models. Source: Reddit – 1Ts6J6J
Local benchmark comparisons emerging. A HackerNews post "Local Harness Benchmark: Pi Coding Agent vs. OpenCode with Qwen3.6 35B A3B" (May 4, 2026) compared two coding agent frameworks using the open-source Qwen3.6 model, suggesting growing ecosystem maturity around local Qwen-based coding agents. Source: Hn – Search By Date
Enterprise Readiness
| Feature | Available? | Details |
|---|---|---|
| SSO (SAML/OIDC) | Partial | Alibaba Cloud RAM (Resource Access Management) supports SAML SSO for console access. No native OIDC integration documented for API access. |
| SCIM | No | No SCIM-based user provisioning documented. Teams must manually add members via the Token Plan management console. |
| Audit logs | Yes | Billing and usage logs available through Alibaba Cloud Billing Center. Token Plan provides per-member usage analytics. |
| IP indemnity | No | No IP indemnity commitment documented for Qwen models or Bailian platform. |
| Data residency | Yes | China mainland (Beijing), Singapore, US (Virginia), and EU (Frankfurt) regions available for pay-as-you-go. Token Plan and Coding Plan currently limited to Beijing (China mainland). Source: Aliyun – Regions |
| HIPAA | No | No HIPAA compliance documented. |
| Air-gapped/on-prem | Partial | Model Studio supports importing and deploying custom models on dedicated instances, but no fully air-gapped deployment for Qwen hosted models is documented. |
| SLA | Undisclosed | No specific SLA for model availability documented in the public pricing or service pages. |
| Admin controls (RBAC) | Yes | Alibaba Cloud RAM supports role-based access control. Token Plan provides workspace-level permission management with admin and member roles. |
Terms explained:
- Data residency — Alibaba Cloud offers multiple deployment regions. However, the subscription plans (Coding Plan, Token Plan) are currently limited to the Beijing region, which may be a constraint for teams requiring data processing outside China. Source: Aliyun – Regions
- IP indemnity — No intellectual property indemnification is offered for Qwen model outputs, unlike Microsoft's Copilot Copyright Commitment or Anthropic's similar programs.
Transparency Gaps
- Rate limits undisclosed. Pay-as-you-go API rate limits (RPM/TPM) are not published on the pricing page. A separate rate-limit page exists but was not fully accessible. Without published limits, teams cannot plan capacity.
- Coding Plan per-request consumption vague. The Coding Plan documentation states "simple tasks consume about 5-10 requests, complex tasks about 10-30+ requests" without defining what constitutes a "simple" or "complex" task. No per-model breakdown is provided.
- Token Plan Credits formula not fully disclosed. While an example is given (qwen3.6-plus consuming ~3.18 Credits for a specific request), the actual per-model Credit rates and the formula for thinking mode, tool calls, and cached tokens are not published. Users must rely on the billing dashboard for actual consumption.
- Qwen3.7-max benchmarks not published. Unlike the open-source Qwen3-Coder model, Alibaba has not published benchmark scores for the closed-source qwen3.7-max, qwen3.6-plus, or qwen3.6-flash models. The release notes describe capabilities qualitatively ("excels in programming, office productivity, and long-horizon autonomous execution") without specific scores.
- Coding Plan data-use terms. The Coding Plan explicitly grants Alibaba a license to use model inputs and outputs for training. While disclosed, the scope of this usage (which models it trains, whether it extends beyond Qwen, retention period) is not specified beyond a reference to the Alibaba Cloud Bailian Service Agreement Section 5.2.
- qwen.ai blog content inaccessible. The new Qwen blog at qwen.ai is heavily JavaScript-rendered, making it impossible to extract posts via standard web fetching tools. This reduces transparency for non-Chinese-speaking audiences tracking Qwen developments.
- SLA not documented. No service-level agreement for model availability, latency, or error rates was found in the public documentation.