Alibaba

Executive Summary

What it is: Alibaba offers LLM models for coding through its Model Studio (Bailian) platform, accessible via API, CLI tool (Qwen Code), and VS Code plugin. Three billing options: pay-as-you-go token billing, a flat-rate Coding Plan at ¥200/month, and a team-oriented Token Plan starting at ¥198/seat/month. Models range from the flagship qwen3.7-max at ¥12/¥36 per MTok to the budget qwen3.6-flash at ¥1.2/¥7.2 per MTok (all CNY, China mainland pricing).

What to watch out for: Coding Plan terms explicitly prohibit API-based automated usage and reserve the right to suspend accounts that violate this restriction. Coding Plan also grants Alibaba a license to use your inputs and outputs for model training, which may conflict with enterprise data policies. Token Plan does not have this data-use clause and explicitly promises no model training on conversations. Most documentation and pricing are in Chinese, making it harder for non-Chinese-speaking teams to evaluate. The Coding Plan Lite tier was discontinued in March 2026 with no replacement at the lower price point.

Bottom line: Alibaba's Qwen platform is the cheapest way to access frontier-class coding models at scale, with qwen3.6-plus at ¥2/MTok input being exceptionally competitive. However, the Coding Plan's data-use terms and API restrictions make it unsuitable for enterprise workloads; teams should opt for Token Plan or pay-as-you-go instead. The platform is best suited for China-based teams comfortable with Chinese-language documentation.

Key Terms

  • Bailian (Model Studio) — Alibaba Cloud's managed LLM platform (help.aliyun.com/zh/model-studio). Provides API access, token billing, and subscription plans for Qwen and third-party models.
  • Coding Plan — A flat-rate monthly subscription (¥200/month Pro tier) that provides request-based access to coding models through CLI tools. Billed per request, not per token. Not available for API/automated use. Source: Aliyun – Coding Plan
  • Token Plan (Team Edition) — A per-seat monthly subscription with Credits-based billing. Three tiers: Standard (¥198/seat/month, 25K Credits), Advanced (¥698, 100K Credits), Premium (¥1,398, 250K Credits). Credits are consumed based on model, tokens, and thinking mode. Source: Aliyun – Token Plan Overview
  • Credits — Token Plan's billing unit. Consumption depends on model type, token count, thinking mode, and tool calls. For example, a qwen3.6-plus request with ~8K input tokens, ~41K cached tokens, and ~573 output tokens consumes approximately 3.18 Credits.
  • Context caching — Reusing previously cached input tokens at a discount. Supported on qwen3.7-max, qwen3.6-max-preview, and several other models. Reduces input token cost. Source: Aliyun – Context Cache
  • Batch API — Asynchronous inference at 50% of real-time pricing. Supported on qwen3.7-max, qwen3.6-plus, and other models. Source: Aliyun – Batch Inference
  • Tiered pricing — Input token cost increases with larger context windows. For example, qwen3.6-plus costs ¥2/MTok for inputs up to 256K tokens, rising to ¥8/MTok for inputs between 256K and 1M tokens.

Latest Changes

Changes since the 2026-04 report.

  • New model: qwen3.7-max launched 2026-05-21 as the new flagship. 1M context, 64K max output, 256K thinking budget. Pricing: ¥12/MTok input, ¥36/MTok output. Thinking mode enabled by default. Supports batch (50%) and context caching. Replaces qwen3.6-max-preview as the top-tier model. Source: Aliyun – Newly Released Models
  • New model: qwen3.7-max-preview launched 2026-05-25. Same specs as qwen3.7-max but thinking-only mode. Snapshot: qwen3.7-max-2026-05-17.
  • New model snapshot: qwen3.7-max-2026-05-20 — pinned snapshot of qwen3.7-max for reproducibility.
  • New third-party model: xiaomi/mimo-v2.5-pro added 2026-05-19, 1M context, available via API.
  • New third-party model: ZHIPU/GLM-5.1 and GLM-5 added 2026-05-19. GLM-5.1 has 198K context.
  • New third-party model: stepfun/step-3.7-flash added 2026-05-29.
  • New third-party model: vanchin/deepseek-v4-pro added 2026-05-29 (Kuaishou/Vanchin-hosted DeepSeek).
  • Token Plan promo: qwen3.7-max Credits consumption halved until 2026-06-22. Also supports implicit caching on Token Plan. Source: Aliyun – Token Plan Overview
  • HLD update needed: HLD.md lists "Qwen3.6-Max-Preview" as the latest LLM but the current flagship is now qwen3.7-max. The Latest LLM column should be updated.

Plans

PlanPriceBillingUsage LimitsData Used for Training?Key Models
Pay-as-you-goPer-tokenCNY per MTokUndisclosed rate limits (see rate-limit page)NoAll Qwen + third-party models
Coding Plan Pro¥200/monthPer-request6,000 req/5hr, 45,000/week, 90,000/monthYes (explicitly stated in terms)qwen3.6-plus, qwen3.5-plus, qwen3-coder-next, qwen3-coder-plus, kimi-k2.5, glm-5, glm-4.7, MiniMax-M2.5, qwen3-max-2026-01-23
Coding Plan LiteDiscontinued (March 2026)Per-requestN/AN/AN/A
Token Plan Standard¥198/seat/monthCredits (25,000/seat/month)UndisclosedNo (explicitly promised)qwen3.7-max, qwen3.6-plus, qwen3.6-flash, deepseek-v4-pro/flash/v3.2, kimi-k2.6/k2.5, glm-5.1/5, MiniMax-M2.5, qwen-image-2.0/pro, wan2.7-image/pro
Token Plan Advanced¥698/seat/monthCredits (100,000/seat/month)UndisclosedNoSame as Standard
Token Plan Premium¥1,398/seat/monthCredits (250,000/seat/month)UndisclosedNoSame as Standard
Token Plan Shared Pack¥5,000/packCredits (625,000/pack)1-month expiryNoSame as Standard
Free tierFree1M input + 1M output tokensOne-time, 90-day expiryNoMost Qwen models

Terms explained:

  • Credits — Token Plan's billing unit. A single qwen3.6-plus request with ~8K input tokens consumes roughly 3.18 Credits. Actual consumption varies by model, thinking mode, and tool usage. Source: Aliyun – Token Plan Overview
  • Data used for training — Coding Plan explicitly states that model inputs and outputs will be used for service improvement and model optimization during the subscription period. Token Plan explicitly states it does not use conversation data for model training. This distinction is critical for enterprise buyers.

API Pricing

All prices in CNY (Chinese Yuan) per million tokens, China mainland region (Beijing). Approximate USD equivalent at ~$1 = ¥7.2.

Qwen3.7 Max (flagship, launched 2026-05-21)

ModelModeInput TierInput (¥/MTok)Output (¥/MTok)
qwen3.7-maxNon-thinking + Thinking0-1M1236
qwen3.7-max-2026-05-20Non-thinking + Thinking0-1M1236
qwen3.7-max-previewThinking only0-1M1236
qwen3.7-max-2026-05-17Thinking only0-1M1236

Features: Batch API (50% discount), context caching (discount on cached input tokens). Free tier: 1M tokens each input/output, 90-day expiry.

Qwen3.6 Max Preview

ModelModeInput TierInput (¥/MTok)Output (¥/MTok)
qwen3.6-max-previewNon-thinking + Thinking0-128K954
qwen3.6-max-previewNon-thinking + Thinking128K-256K1590

Features: Context caching supported. 256K context window.

Qwen3.6 Plus

ModelModeInput TierInput (¥/MTok)Output: Non-thinking (¥/MTok)Output: Thinking (¥/MTok)
qwen3.6-plusBoth0-256K21212
qwen3.6-plusBoth256K-1M84848
qwen3.6-plus-2026-04-02Both0-256K21212
qwen3.6-plus-2026-04-02Both256K-1M84848

Features: Batch API (50%), 1M context window. Supports Token Plan and Coding Plan. Free tier: 1M tokens each.

Qwen3.6 Flash

ModelModeInput TierInput (¥/MTok)Output (¥/MTok)
qwen3.6-flashNon-thinking + Thinking0-256K1.27.2
qwen3.6-flashNon-thinking + Thinking256K-1M4.828.8
qwen3.6-flash-2026-04-16Non-thinking + Thinking0-256K1.27.2
qwen3.6-flash-2026-04-16Non-thinking + Thinking256K-1M4.828.8

Features: Batch API (50%), context caching. 1M context window. Supports Token Plan. Free tier: 1M tokens each.

Approximate USD Equivalent (at $1 = ¥7.2)

ModelInput ($/MTok)Output ($/MTok)
qwen3.7-max~$1.67~$5.00
qwen3.6-max-preview (≤128K)~$1.25~$7.50
qwen3.6-plus (≤256K)~$0.28~$1.67
qwen3.6-flash (≤256K)~$0.17~$1.00

Model Performance / Benchmarks

Qwen3-Coder-480B-A35B-Instruct (open-source, released July 2025) benchmarks from the official blog post:

BenchmarkScoreComparison
SWE-Bench VerifiedState-of-the-art among open modelsComparable to Claude Sonnet 4 (from July 2025)
Agentic Coding (multi-benchmark)State-of-the-art among open modelsComparable to Claude Sonnet 4
Agentic Browser-UseState-of-the-art among open modelsComparable to Claude Sonnet 4
Agentic Tool-UseState-of-the-art among open modelsComparable to Claude Sonnet 4

No published coding benchmarks for the closed-source qwen3.7-max, qwen3.6-plus, or qwen3.6-flash models as of May 2026. The model release notes describe qwen3.7-max as excelling in "programming, office productivity, and long-horizon autonomous execution" but do not provide specific benchmark scores. Source: Aliyun – Newly Released Models

Latest News

qwen3.7-max launched as new flagship (2026-05-21). Alibaba released qwen3.7-max, the newest model in the Qwen Max series, with a 1M token context window, 64K max output, and 256K thinking budget. Thinking mode is enabled by default. The model is positioned as superior in programming, office productivity, and long-horizon autonomous tasks. Priced at ¥12/MTok input and ¥36/MTok output, it is 33% more expensive at input than the previous qwen3.6-max-preview (¥9/MTok at ≤128K tokens) but offers a much larger 1M context window (vs. 256K). Supports batch API at 50% discount and context caching. Source: Aliyun – Newly Released Models

qwen3.7-max-preview released (2026-05-25). A preview variant of qwen3.7-max with thinking-only mode, same pricing. Snapshot: qwen3.7-max-2026-05-17. Source: Aliyun – Newly Released Models

Token Plan promo: qwen3.7-max at half Credits consumption (until 2026-06-22). During the promotional period, qwen3.7-max on Token Plan consumes Credits at 50% of normal rate, and also supports implicit caching. This makes the flagship model effectively affordable on the Token Plan's seat-based billing. Source: Aliyun – Token Plan Overview

Multiple third-party models added (May 2026). xiaomi/mimo-v2.5-pro (May 19), ZHIPU/GLM-5.1 and GLM-5 (May 19), stepfun/step-3.7-flash (May 29), and vanchin/deepseek-v4-pro (May 29, Kuaishou-hosted). These expand the model marketplace on Bailian but are not Qwen models. Source: Aliyun – Newly Released Models

Qwen Code CLI now available via Alibaba Cloud Model Studio. Qwen Code is a CLI coding agent (forked from Gemini CLI) with a VS Code plugin ("Qwen Code Companion"). It supports three billing modes: Token Plan, Coding Plan, and pay-as-you-go. Compatible with OpenClaw, Hermes Agent, Claude Code, OpenCode, Cursor, Codex, Cline, Qoder, Kilo CLI, and other tools. Source: Aliyun – Qwen Code

Qwen blog migrated to qwen.ai. The old blog at qwenlm.github.io/blog/ now redirects to qwen.ai/research. The last substantive post on the old blog was "Qwen3Guard" (September 23, 2025). No May 2026 blog posts were found at the new location (qwen.ai appears to be heavily JS-rendered and content was not accessible via standard fetch or browser snapshot).

Community Signals

Qwen3.6-35B-A3B open-source release generated massive interest. A HackerNews post titled "Qwen3.6-35B-A3B: Agentic coding power, now open to all" received 1,274 upvotes on April 16, 2026, indicating strong developer interest in locally-runnable Qwen coding models. The model is a 35B-parameter MoE with only 3B active parameters, making it practical for consumer hardware. Source: Hn – Search

Active local coding agent experiments with Qwen 3.6. On Reddit r/LocalLLaMA, a post "Running Qwen 3.6 35b MoE With Zoo Code On M1 Max is Amazing!" (8 upvotes, 14 comments) described successful local coding agent usage on Apple Silicon. Another thread "Qwen 3.6 coding choice: 27B vs 35B quants" (6 upvotes, 84 comments) showed active community debate about optimal quantization for coding tasks. Source: Reddit – Search

NVIDIA quantized release of Qwen3.6-35B drew significant attention. "nvidia/Qwen3.6-35B-A3B-NVFP4" on Hugging Face was shared on r/LocalLLaMA and received 198 upvotes and 36 comments, indicating strong demand for optimized inference of Qwen coding models. Source: Reddit – 1Ts6J6J

Local benchmark comparisons emerging. A HackerNews post "Local Harness Benchmark: Pi Coding Agent vs. OpenCode with Qwen3.6 35B A3B" (May 4, 2026) compared two coding agent frameworks using the open-source Qwen3.6 model, suggesting growing ecosystem maturity around local Qwen-based coding agents. Source: Hn – Search By Date

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML/OIDC)PartialAlibaba Cloud RAM (Resource Access Management) supports SAML SSO for console access. No native OIDC integration documented for API access.
SCIMNoNo SCIM-based user provisioning documented. Teams must manually add members via the Token Plan management console.
Audit logsYesBilling and usage logs available through Alibaba Cloud Billing Center. Token Plan provides per-member usage analytics.
IP indemnityNoNo IP indemnity commitment documented for Qwen models or Bailian platform.
Data residencyYesChina mainland (Beijing), Singapore, US (Virginia), and EU (Frankfurt) regions available for pay-as-you-go. Token Plan and Coding Plan currently limited to Beijing (China mainland). Source: Aliyun – Regions
HIPAANoNo HIPAA compliance documented.
Air-gapped/on-premPartialModel Studio supports importing and deploying custom models on dedicated instances, but no fully air-gapped deployment for Qwen hosted models is documented.
SLAUndisclosedNo specific SLA for model availability documented in the public pricing or service pages.
Admin controls (RBAC)YesAlibaba Cloud RAM supports role-based access control. Token Plan provides workspace-level permission management with admin and member roles.

Terms explained:

  • Data residency — Alibaba Cloud offers multiple deployment regions. However, the subscription plans (Coding Plan, Token Plan) are currently limited to the Beijing region, which may be a constraint for teams requiring data processing outside China. Source: Aliyun – Regions
  • IP indemnity — No intellectual property indemnification is offered for Qwen model outputs, unlike Microsoft's Copilot Copyright Commitment or Anthropic's similar programs.

Transparency Gaps

  1. Rate limits undisclosed. Pay-as-you-go API rate limits (RPM/TPM) are not published on the pricing page. A separate rate-limit page exists but was not fully accessible. Without published limits, teams cannot plan capacity.
  1. Coding Plan per-request consumption vague. The Coding Plan documentation states "simple tasks consume about 5-10 requests, complex tasks about 10-30+ requests" without defining what constitutes a "simple" or "complex" task. No per-model breakdown is provided.
  1. Token Plan Credits formula not fully disclosed. While an example is given (qwen3.6-plus consuming ~3.18 Credits for a specific request), the actual per-model Credit rates and the formula for thinking mode, tool calls, and cached tokens are not published. Users must rely on the billing dashboard for actual consumption.
  1. Qwen3.7-max benchmarks not published. Unlike the open-source Qwen3-Coder model, Alibaba has not published benchmark scores for the closed-source qwen3.7-max, qwen3.6-plus, or qwen3.6-flash models. The release notes describe capabilities qualitatively ("excels in programming, office productivity, and long-horizon autonomous execution") without specific scores.
  1. Coding Plan data-use terms. The Coding Plan explicitly grants Alibaba a license to use model inputs and outputs for training. While disclosed, the scope of this usage (which models it trains, whether it extends beyond Qwen, retention period) is not specified beyond a reference to the Alibaba Cloud Bailian Service Agreement Section 5.2.
  1. qwen.ai blog content inaccessible. The new Qwen blog at qwen.ai is heavily JavaScript-rendered, making it impossible to extract posts via standard web fetching tools. This reduces transparency for non-Chinese-speaking audiences tracking Qwen developments.
  1. SLA not documented. No service-level agreement for model availability, latency, or error rates was found in the public documentation.