Key Terms
- Model Studio (Bailian) - Alibaba Cloud's platform for accessing Qwen and third-party LLMs via API. All Qwen models are accessed through this platform. Source: Aliyun – Models
- Tiered pricing - Qwen models use input-token-count-based pricing. Requests with more input tokens cost more per million tokens. For example, Qwen3.6-Plus costs 2 RMB/MTok for inputs under 256K tokens, but 8 RMB/MTok for inputs between 256K and 1M tokens. Source: Aliyun – Models
- Qwen Code - A CLI tool for agentic coding, adapted from Gemini CLI with customized prompts for Qwen Coder models. Installable via
npm i -g @qwen-code/qwen-code. Also works with Claude Code via proxy. Source: Alibaba – Qwen3 Coder - Context caching - Qwen supports implicit (automatic, 20% of input price) and explicit (user-managed, 10% of input price) context caching. Source: Aliyun – Context Cache
- Batch API - Asynchronous processing at 50% discount. Not subject to real-time rate limits. Source: Aliyun – Batch Interfaces Compatible With Openai
- Thinking mode - Extended reasoning capability available on Qwen3.6-Max-Preview, Qwen3.6-Plus, and Qwen3-Coder models. Default is enabled. Output includes both thinking chain and final response, both billed at the output rate. Source: Aliyun – Deep Thinking
Latest Changes
First report for this supplier. All models, plans, and pricing are listed as current state.
- New model: Qwen3.6-Plus released April 2. 1M context, thinking mode. 2/12 RMB per MTok for inputs under 256K.
- New model: Qwen3.6-Flash released April 16. 1M context. Flat pricing at 1.2/7.2 RMB per MTok.
- New model: Qwen3.6-Max-Preview available. 262K context, thinking mode. 9/54 RMB per MTok. Preview with 600 RPM limit.
- Feature added: Qwen3-Coder Claude Code integration available via proxy API endpoint.
Plans
Alibaba Cloud Model Studio is API-only with no consumer subscription plans. Usage is billed per token with no monthly minimum. All new accounts receive free quotas.
Free Quota (New Accounts)
Each model provides 1 million tokens (input + output combined) free for the first 90 days after signup. Free quota only applies to the China mainland region. Does not apply to Batch API, context caching, or model fine-tuning.
Source: Aliyun – New Free Quota
Billing Regions
Pricing varies by deployment region:
- China mainland (Beijing): cheapest, prices in RMB
- Global (Virginia, USA): mid-tier, global dynamic scheduling
- International (Singapore): excludes China mainland, higher prices
- United States (Virginia): data residency in US only, limited model selection
Source: Aliyun – Regions
Rate Limits (China Mainland, Stable Versions)
| Model | RPM | TPM |
|---|---|---|
| Qwen3.6-Max-Preview | 600 | 1,000,000 |
| Qwen3.6-Plus | 30,000 | 5,000,000 |
| Qwen3.6-Flash | 30,000 | 10,000,000 |
| Qwen3-Coder-Plus | 5,000 | 5,000,000 |
| Qwen3-Coder-Flash | 5,000 | 5,000,000 |
Snapshot versions have significantly lower limits (60 RPM, 100K TPM). Users can request temporary TPM increases (30-day validity) via the console.
Source: Aliyun – Rate Limit
API Pricing
All prices below are for the China mainland region. Conversion: 1 USD is approximately 7.2 RMB (prices shown in RMB per million tokens).
General-Purpose Models
Qwen3.6-Max-Preview (262K context, thinking mode)
| Input Token Range | Input (RMB/MTok) | Output (RMB/MTok) | Approx. USD (Input/Output) |
|---|---|---|---|
| 0 - 128K | 9 | 54 | $1.25 / $7.50 |
| 128K - 256K | 15 | 90 | $2.08 / $12.50 |
Qwen3.6-Plus (1M context, thinking mode, released April 2, 2026)
| Input Token Range | Input (RMB/MTok) | Output (RMB/MTok) | Approx. USD (Input/Output) |
|---|---|---|---|
| 0 - 256K | 2 | 12 | $0.28 / $1.67 |
| 256K - 1M | 8 | 48 | $1.11 / $6.67 |
Qwen3.6-Flash (1M context, released April 16, 2026)
| Input Token Range | Input (RMB/MTok) | Output (RMB/MTok) | Approx. USD (Input/Output) |
|---|---|---|---|
| All tiers | 1.2 | 7.2 | $0.17 / $1.00 |
Code-Specific Models
Qwen3-Coder-Plus (1M context, 480B MoE with 35B active parameters)
| Input Token Range | Input (RMB/MTok) | Output (RMB/MTok) | Approx. USD (Input/Output) |
|---|---|---|---|
| 0 - 32K | 4 | 16 | $0.56 / $2.22 |
| 32K - 128K | 6 | 24 | $0.83 / $3.33 |
| 128K - 256K | 10 | 40 | $1.39 / $5.56 |
| 256K - 1M | 20 | 200 | $2.78 / $27.78 |
Qwen3-Coder-Flash (1M context)
| Input Token Range | Input (RMB/MTok) | Output (RMB/MTok) | Approx. USD (Input/Output) |
|---|---|---|---|
| 0 - 32K | 1 | 4 | $0.14 / $0.56 |
| 32K - 128K | 1.5 | 6 | $0.21 / $0.83 |
| 128K - 256K | 2.5 | 10 | $0.35 / $1.39 |
| 256K - 1M | 5 | 20 | $0.69 / $2.78 |
Discounts
- Batch API: 50% discount on all models (async, results within hours)
- Context caching (implicit): 20% of input price (automatic)
- Context caching (explicit): 10% of input price (user-managed)
Source: Aliyun – Models
Model Performance / Benchmarks
No official benchmark results found for Qwen3.6-Max-Preview, Qwen3.6-Plus, or Qwen3.6-Flash. The Qwen blog has migrated to qwen.ai with low discoverability; the old blog (qwenlm.github.io) has not been updated since July 2025.
Qwen3-Coder (480B MoE, 35B active) was received positively at launch:
- HN: 765 points, 366 comments. Described as "comparable to Claude Sonnet 4" for coding and agentic tasks.
- Used by: Cursor Composer 1, Cerebras Code, Qwen Code CLI, Claude Code (via proxy), Cline.
Source: News – From
Latest News
Qwen3.6-Plus Released (April 2, 2026)
Qwen3.6-Plus (qwen3.6-plus-2026-04-02) released as the balanced tier model. Features 1M context window, thinking and non-thinking modes, supports text/image/video input. Default thinking mode enabled. Priced at 2/12 RMB per MTok for inputs under 256K tokens, 8/48 RMB for 256K-1M tokens. Replaces Qwen3.5-Plus at the same price point.
Source: Aliyun – Models
Qwen3.6-Flash Released (April 16, 2026)
Qwen3.6-Flash (qwen3.6-flash-2026-04-16) released as the fast, low-cost tier. Features 1M context window. Flat pricing at 1.2/7.2 RMB per MTok regardless of input size.
Source: Aliyun – Models
Qwen3.6-Max-Preview Available (April 2026)
Qwen3.6-Max-Preview introduced as the flagship model with 262K context and thinking mode. Priced at 9/54 RMB per MTok. Marked as "preview" with 600 RPM limit, compared to 30,000 RPM for Plus stable. No official blog post found on the new qwen.ai blog.
Source: Aliyun – Models
Qwen3-Coder Claude Code Integration (July 2025, ongoing)
Qwen3-Coder can be used with Claude Code via a proxy API endpoint (dashscope-intl.aliyuncs.com/api/v2/apps/claude-code-proxy). This allows developers to use Qwen3-Coder as the backend model for Claude Code's agentic coding features.
Source: Alibaba – Qwen3 Coder
Community Signals
Qwen3-Coder Launch Reception (HN, July 2025)
HN: 765 points, 366 comments. Strong positive reception for the 480B MoE model's coding and agentic capabilities, described as "comparable to Claude Sonnet 4." The model was noted for its tool-calling and browser-use capabilities.
Source: News – From
Qwen3 Launch Reception (HN, April 2025)
HN: 869 points, 388 comments. The original Qwen3 announcement drew significant attention for its open-source MoE architecture and competitive benchmark results.
Source: News – From
Qwen3-Coder Used by Major Coding Agents
- Qwen Code CLI (official, forked from Gemini CLI)
- Claude Code (via proxy integration)
- Cline (via OpenAI-compatible API)
- Cursor Composer 1 was reportedly based on a Qwen model
- Cerebras Code uses Qwen3 Coder 480B as an inference provider
Low Visibility for Qwen3.6
No significant HN, Reddit, or social media discussion found specifically about Qwen3.6-Max-Preview, Qwen3.6-Plus, or Qwen3.6-Flash launches in April 2026. The Qwen blog has migrated to qwen.ai, which has low discoverability. The old blog (qwenlm.github.io) has not been updated since July 2025.
Enterprise Readiness
| Feature | Available? | Details |
|---|---|---|
| SSO (SAML) | No | Alibaba Cloud Model Studio is API-only with no SSO integration mentioned. Enterprise accounts managed through Alibaba Cloud IAM. |
| SSO (OIDC) | No | Not mentioned. |
| SCIM | No | Not mentioned. |
| Audit logs | No | Not mentioned for Model Studio. Available in broader Alibaba Cloud infrastructure. |
| IP indemnity | No | Not mentioned. |
| Data residency | Yes | Multiple billing regions: China mainland (Beijing), Global (Virginia), International (Singapore), United States (Virginia). Pricing varies by region. Source: Aliyun – Regions |
| HIPAA | No | Not mentioned for Model Studio. |
| Air-gapped / on-prem | No | Not available for Model Studio. |
| SLA | No | No published SLA for Model Studio. |
| Admin controls (RBAC) | Partial | Alibaba Cloud account-level access controls apply. No model-studio-specific RBAC documented. |
Transparency Gaps
| Gap | Details | Severity |
|---|---|---|
| Pricing in RMB only | International pricing is listed in RMB without USD conversion. Buyers must manually convert, and exchange rates fluctuate. | Medium |
| Tiered pricing complexity | Per-request cost varies based on input token count, making cost estimation difficult for agentic use cases where context grows over time. A 256K+ input request costs 10x more per token than a 32K request. | High |
| Qwen3.6 benchmark claims absent | No official blog post or benchmark results found for Qwen3.6-Max-Preview, Qwen3.6-Plus, or Qwen3.6-Flash. The model's capabilities relative to competitors are unknown beyond the pricing page's "suitable for complex tasks" description. | High |
| Coder model at 256K-1M context | Qwen3-Coder-Plus output pricing jumps from 16 RMB/MTok (0-32K) to 200 RMB/MTok (256K-1M), a 12.5x increase. This is not prominently disclosed and could surprise users with large codebases. | High |
| No consumer product | Qwen Code is a CLI tool only. There is no IDE extension, web interface, or subscription plan. Developers must manage API keys, billing, and infrastructure themselves. | Medium |
| Snapshot version rate limits | Snapshot versions have drastically lower rate limits (60 RPM vs 30,000 RPM). Users who pin to a specific version for reproducibility are penalized. | Low |
| Blog migration | The blog moved from qwenlm.github.io to qwen.ai, but the new site has poor web accessibility and no apparent April 2026 content. News about Qwen3.6 is effectively invisible. | Medium |