Alibaba

AI Coding Agents Report: April 2026 · Updated 01 May 2026 · Version history

Executive Summary

What it is: Alibaba's Qwen is an API-only model platform (no IDE or consumer product) accessed through Alibaba Cloud Model Studio (Bailian). It offers code-specific models (Qwen3-Coder-Plus, Qwen3-Coder-Flash) and general-purpose models (Qwen3.6-Max-Preview, Qwen3.6-Plus, Qwen3.6-Flash) with tiered pricing based on input token count. All models support up to 1M context windows, and new accounts receive 1 million free tokens for the first 90 days.

What to watch out for: Pricing is in RMB with no USD conversion on the platform. Qwen3-Coder-Plus output pricing jumps 12.5x (from 16 to 200 RMB/MTok) when inputs exceed 256K tokens, which is not prominently disclosed and can be costly for large codebases. No Qwen3.6 benchmark results have been published, and the Qwen blog has migrated to a new URL (qwen.ai) with low discoverability. The old blog (qwenlm.github.io) has not been updated since July 2025.

Bottom line: Qwen3-Coder-Plus at approximately $0.56/$2.22 per MTok (for inputs under 32K tokens) is one of the cheapest code-specific models available, but the tiered pricing penalizes large-context agentic workflows severely. The lack of a first-party IDE product means you must pair Qwen with a third-party agent (Claude Code via proxy, Qwen Code CLI, or Cline). Best suited for teams already using Alibaba Cloud infrastructure who want low-cost coding inference.

Key Terms

Model Studio (Bailian) - Alibaba Cloud's platform for accessing Qwen and third-party LLMs via API. All Qwen models are accessed through this platform. Source: Aliyun – Models
Tiered pricing - Qwen models use input-token-count-based pricing. Requests with more input tokens cost more per million tokens. For example, Qwen3.6-Plus costs 2 RMB/MTok for inputs under 256K tokens, but 8 RMB/MTok for inputs between 256K and 1M tokens. Source: Aliyun – Models
Qwen Code - A CLI tool for agentic coding, adapted from Gemini CLI with customized prompts for Qwen Coder models. Installable via npm i -g @qwen-code/qwen-code. Also works with Claude Code via proxy. Source: Alibaba – Qwen3 Coder
Context caching - Qwen supports implicit (automatic, 20% of input price) and explicit (user-managed, 10% of input price) context caching. Source: Aliyun – Context Cache
Batch API - Asynchronous processing at 50% discount. Not subject to real-time rate limits. Source: Aliyun – Batch Interfaces Compatible With Openai
Thinking mode - Extended reasoning capability available on Qwen3.6-Max-Preview, Qwen3.6-Plus, and Qwen3-Coder models. Default is enabled. Output includes both thinking chain and final response, both billed at the output rate. Source: Aliyun – Deep Thinking

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

New model: Qwen3.6-Plus released April 2. 1M context, thinking mode. 2/12 RMB per MTok for inputs under 256K.
New model: Qwen3.6-Flash released April 16. 1M context. Flat pricing at 1.2/7.2 RMB per MTok.
New model: Qwen3.6-Max-Preview available. 262K context, thinking mode. 9/54 RMB per MTok. Preview with 600 RPM limit.
Feature added: Qwen3-Coder Claude Code integration available via proxy API endpoint.

Plans

Alibaba Cloud Model Studio is API-only with no consumer subscription plans. Usage is billed per token with no monthly minimum. All new accounts receive free quotas.

Free Quota (New Accounts)

Each model provides 1 million tokens (input + output combined) free for the first 90 days after signup. Free quota only applies to the China mainland region. Does not apply to Batch API, context caching, or model fine-tuning.

Source: Aliyun – New Free Quota

Billing Regions

Pricing varies by deployment region:

China mainland (Beijing): cheapest, prices in RMB
Global (Virginia, USA): mid-tier, global dynamic scheduling
International (Singapore): excludes China mainland, higher prices
United States (Virginia): data residency in US only, limited model selection

Source: Aliyun – Regions

Rate Limits (China Mainland, Stable Versions)

Model	RPM	TPM
Qwen3.6-Max-Preview	600	1,000,000
Qwen3.6-Plus	30,000	5,000,000
Qwen3.6-Flash	30,000	10,000,000
Qwen3-Coder-Plus	5,000	5,000,000
Qwen3-Coder-Flash	5,000	5,000,000

Snapshot versions have significantly lower limits (60 RPM, 100K TPM). Users can request temporary TPM increases (30-day validity) via the console.

Source: Aliyun – Rate Limit

API Pricing

All prices below are for the China mainland region. Conversion: 1 USD is approximately 7.2 RMB (prices shown in RMB per million tokens).

General-Purpose Models

Qwen3.6-Max-Preview (262K context, thinking mode)

Input Token Range	Input (RMB/MTok)	Output (RMB/MTok)	Approx. USD (Input/Output)
0 - 128K	9	54	$1.25 / $7.50
128K - 256K	15	90	$2.08 / $12.50

Qwen3.6-Plus (1M context, thinking mode, released April 2, 2026)

Input Token Range	Input (RMB/MTok)	Output (RMB/MTok)	Approx. USD (Input/Output)
0 - 256K	2	12	$0.28 / $1.67
256K - 1M	8	48	$1.11 / $6.67

Qwen3.6-Flash (1M context, released April 16, 2026)

Input Token Range	Input (RMB/MTok)	Output (RMB/MTok)	Approx. USD (Input/Output)
All tiers	1.2	7.2	$0.17 / $1.00

Code-Specific Models

Qwen3-Coder-Plus (1M context, 480B MoE with 35B active parameters)

Input Token Range	Input (RMB/MTok)	Output (RMB/MTok)	Approx. USD (Input/Output)
0 - 32K	4	16	$0.56 / $2.22
32K - 128K	6	24	$0.83 / $3.33
128K - 256K	10	40	$1.39 / $5.56
256K - 1M	20	200	$2.78 / $27.78

Qwen3-Coder-Flash (1M context)

Input Token Range	Input (RMB/MTok)	Output (RMB/MTok)	Approx. USD (Input/Output)
0 - 32K	1	4	$0.14 / $0.56
32K - 128K	1.5	6	$0.21 / $0.83
128K - 256K	2.5	10	$0.35 / $1.39
256K - 1M	5	20	$0.69 / $2.78

Discounts

Batch API: 50% discount on all models (async, results within hours)
Context caching (implicit): 20% of input price (automatic)
Context caching (explicit): 10% of input price (user-managed)

Source: Aliyun – Models

Model Performance / Benchmarks

No official benchmark results found for Qwen3.6-Max-Preview, Qwen3.6-Plus, or Qwen3.6-Flash. The Qwen blog has migrated to qwen.ai with low discoverability; the old blog (qwenlm.github.io) has not been updated since July 2025.

Qwen3-Coder (480B MoE, 35B active) was received positively at launch:

HN: 765 points, 366 comments. Described as "comparable to Claude Sonnet 4" for coding and agentic tasks.
Used by: Cursor Composer 1, Cerebras Code, Qwen Code CLI, Claude Code (via proxy), Cline.

Source: News – From

Latest News

Qwen3.6-Plus Released (April 2, 2026)

Qwen3.6-Plus (qwen3.6-plus-2026-04-02) released as the balanced tier model. Features 1M context window, thinking and non-thinking modes, supports text/image/video input. Default thinking mode enabled. Priced at 2/12 RMB per MTok for inputs under 256K tokens, 8/48 RMB for 256K-1M tokens. Replaces Qwen3.5-Plus at the same price point.

Source: Aliyun – Models

Qwen3.6-Flash Released (April 16, 2026)

Qwen3.6-Flash (qwen3.6-flash-2026-04-16) released as the fast, low-cost tier. Features 1M context window. Flat pricing at 1.2/7.2 RMB per MTok regardless of input size.

Source: Aliyun – Models

Qwen3.6-Max-Preview Available (April 2026)

Qwen3.6-Max-Preview introduced as the flagship model with 262K context and thinking mode. Priced at 9/54 RMB per MTok. Marked as "preview" with 600 RPM limit, compared to 30,000 RPM for Plus stable. No official blog post found on the new qwen.ai blog.

Source: Aliyun – Models

Qwen3-Coder Claude Code Integration (July 2025, ongoing)

Qwen3-Coder can be used with Claude Code via a proxy API endpoint (dashscope-intl.aliyuncs.com/api/v2/apps/claude-code-proxy). This allows developers to use Qwen3-Coder as the backend model for Claude Code's agentic coding features.

Source: Alibaba – Qwen3 Coder

Community Signals

Qwen3-Coder Launch Reception (HN, July 2025)

HN: 765 points, 366 comments. Strong positive reception for the 480B MoE model's coding and agentic capabilities, described as "comparable to Claude Sonnet 4." The model was noted for its tool-calling and browser-use capabilities.

Source: News – From

Qwen3 Launch Reception (HN, April 2025)

HN: 869 points, 388 comments. The original Qwen3 announcement drew significant attention for its open-source MoE architecture and competitive benchmark results.

Source: News – From

Qwen3-Coder Used by Major Coding Agents

Qwen Code CLI (official, forked from Gemini CLI)
Claude Code (via proxy integration)
Cline (via OpenAI-compatible API)
Cursor Composer 1 was reportedly based on a Qwen model
Cerebras Code uses Qwen3 Coder 480B as an inference provider

Low Visibility for Qwen3.6

No significant HN, Reddit, or social media discussion found specifically about Qwen3.6-Max-Preview, Qwen3.6-Plus, or Qwen3.6-Flash launches in April 2026. The Qwen blog has migrated to qwen.ai, which has low discoverability. The old blog (qwenlm.github.io) has not been updated since July 2025.

Enterprise Readiness

Feature	Available?	Details
SSO (SAML)	No	Alibaba Cloud Model Studio is API-only with no SSO integration mentioned. Enterprise accounts managed through Alibaba Cloud IAM.
SSO (OIDC)	No	Not mentioned.
SCIM	No	Not mentioned.
Audit logs	No	Not mentioned for Model Studio. Available in broader Alibaba Cloud infrastructure.
IP indemnity	No	Not mentioned.
Data residency	Yes	Multiple billing regions: China mainland (Beijing), Global (Virginia), International (Singapore), United States (Virginia). Pricing varies by region. Source: Aliyun – Regions
HIPAA	No	Not mentioned for Model Studio.
Air-gapped / on-prem	No	Not available for Model Studio.
SLA	No	No published SLA for Model Studio.
Admin controls (RBAC)	Partial	Alibaba Cloud account-level access controls apply. No model-studio-specific RBAC documented.

Transparency Gaps

Gap	Details	Severity
Pricing in RMB only	International pricing is listed in RMB without USD conversion. Buyers must manually convert, and exchange rates fluctuate.	Medium
Tiered pricing complexity	Per-request cost varies based on input token count, making cost estimation difficult for agentic use cases where context grows over time. A 256K+ input request costs 10x more per token than a 32K request.	High
Qwen3.6 benchmark claims absent	No official blog post or benchmark results found for Qwen3.6-Max-Preview, Qwen3.6-Plus, or Qwen3.6-Flash. The model's capabilities relative to competitors are unknown beyond the pricing page's "suitable for complex tasks" description.	High
Coder model at 256K-1M context	Qwen3-Coder-Plus output pricing jumps from 16 RMB/MTok (0-32K) to 200 RMB/MTok (256K-1M), a 12.5x increase. This is not prominently disclosed and could surprise users with large codebases.	High
No consumer product	Qwen Code is a CLI tool only. There is no IDE extension, web interface, or subscription plan. Developers must manage API keys, billing, and infrastructure themselves.	Medium
Snapshot version rate limits	Snapshot versions have drastically lower rate limits (60 RPM vs 30,000 RPM). Users who pin to a specific version for reproducibility are penalized.	Low
Blog migration	The blog moved from qwenlm.github.io to qwen.ai, but the new site has poor web accessibility and no apparent April 2026 content. News about Qwen3.6 is effectively invisible.	Medium

Type: API only
API Input: $0.14/MTok
API Output: $0.56/MTok
Context: 1M
Free Tier: Yes

Compare all suppliers →