Key Terms
- GLM (General Language Model) - Zhipu AI's family of large language models, based on autoregressive blank-filling pretraining. The ChatGLM series supports complex natural language instructions and reasoning. Source: Bigmodel – Introduction
- OpenClaw - Zhipu AI's branding for agentic coding workflows. The term "lobster" (龙虾) is used colloquially in Chinese documentation for coding agent tasks. GLM-5-Turbo is specifically optimized for OpenClaw scenarios. Source: Bigmodel – Glm 5 Turbo
- GLM Coding Plan - A subscription service for AI-powered coding. Supports 20+ coding tools including Claude Code, OpenClaw, OpenCode, Kilo Code, Cline. Billed on 5-hour rolling windows and weekly quotas, not flat monthly tokens. Source: Bigmodel – Overview
- Token-based billing - API usage is charged per million tokens. GLM series models use approximately 1 token per 1.6 Chinese characters. Pricing tiers based on input context length (<32K tokens vs 32K+ tokens). Source: Bigmodel – Introduction
- Prompt caching - Context caching is available for GLM models. Cache storage is currently free (limited-time promotion). Cache hits are billed at reduced rates (e.g., ¥1.3/MTok for GLM-5.1 vs ¥6/MTok full input). Source: Zhipu AI – Pricing
- Batch API - Asynchronous processing at 50% discount. Supported on older GLM-4 series models. Source: Zhipu AI – Pricing
- Context window - Maximum tokens the model processes in one conversation. GLM-5.1 and GLM-5 support 200K context with up to 128K max output. GLM-4-Long supports up to 1M context. Source: Bigmodel – Model Overview
- Thinking mode - Chain-of-thought reasoning enabled via
thinking: { type: "enabled" }. Supported on GLM-5.1, GLM-5, GLM-5-Turbo. Temperature defaults to 1.0 when thinking is enabled. Source: Bigmodel – Thinking Mode - MCP servers - Model Context Protocol servers provided with the Coding Plan, including vision understanding, web search, web page reading, and open source repository reading. Source: Bigmodel – Overview
- Model multiplier - GLM-5.1 and GLM-5-Turbo consume Coding Plan quota at 3x during peak hours (14:00-18:00 UTC+8) and 2x off-peak. A promotion running through June 2026 reduces off-peak to 1x. Source: Bigmodel – Overview
Latest Changes
Changes since the 2026-04 report.
- New model: GLM-5.1 launched April 7, 2026 as the latest flagship. Coding capability aligned with Claude Opus 4.6. SWE-Bench Pro score of 58.4, surpassing GPT-5.4 and Claude Opus 4.6. Supports 8-hour long-horizon autonomous tasks. 200K context, 128K max output. Source: Bigmodel – New Releases
- New model: GLM-5V-Turbo launched April 2, 2026. Multimodal coding model combining vision understanding with coding capability. 200K context, 128K max output. Source: Bigmodel – New Releases
- New model: GLM-5-Turbo launched March 15, 2026. OpenClaw-optimized base model with enhanced tool calling, instruction following, and long-duration task execution. 200K context, 128K max output. Source: Bigmodel – New Releases
- Feature added: GLM Coding Plan now supports GLM-5.1 across all tiers (Lite, Pro, Max). Source: Bigmodel – Overview
- Promotion: GLM-5.1 and GLM-5-Turbo consume Coding Plan quota at 1x (instead of 2x) during off-peak hours, running through end of June 2026. Source: Bigmodel – Overview
- Feature added: GLM in Excel (Beta) included in Coding Plan subscriptions. Source: Bigmodel – Overview
- Milestone: Zhipu AI published its first financial results report as a publicly listed company (HKEX: 02513.HK) on March 31, 2026. Source: Zhipuai – News
- Deprecation: GLM-Z1 series scheduled for deprecation on November 15, 2025 (already deprecated). GLM-4-0520 scheduled for December 30, 2025 deprecation. Source: Bigmodel – Model Overview
Plans
GLM Coding Plan (Personal)
| Plan | Monthly Price | Quarterly Price (per month) | 5-Hour Quota (prompts) | Weekly Quota (prompts) | MCP Calls/Month | Recommended Projects |
|---|---|---|---|---|---|---|
| Lite | ¥49 | ¥44.1 (9% off) | ~80 | ~400 | 100 | 1 small repo |
| Pro | ¥149 | ¥134.1 (9% off) | ~400 | ~2,000 | 1,000 | 1-2 mid-size repos |
| Max | ¥469 | ¥422.1 (9% off) | ~1,600 | ~8,000 | 4,000 | 2+ large repos |
Annual subscriptions available at 20% discount.
What's included in all plans:
- Models: GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air
- MCP tools: Vision understanding, web search, web page reading, open source repo reading
- Compatible tools: Claude Code, OpenClaw, OpenCode, Kilo Code, Cline, TRAE, CodeBuddy, and 20+ others
- GLM in Excel (Beta)
Model multiplier for Coding Plan:
- GLM-5.1 / GLM-5-Turbo: 3x during peak (14:00-18:00 UTC+8), 2x off-peak (promotional: 1x off-peak through June 2026)
- GLM-4.7 / GLM-4.5-Air: 1x at all times
Monthly value estimate: Each plan provides API-equivalent value of 15-30x the monthly subscription price (accounting for weekly quota limits).
Purchase limits: The platform has implemented daily purchase limits on Coding Plan subscriptions due to demand exceeding capacity. Limits are released daily at 10:00 UTC+8. Existing subscribers renewing or upgrading are not affected. Source: Bigmodel – Overview
Free Tier (z.ai)
The z.ai consumer chatbot provides free access to GLM-5.1 and GLM-5 via web interface with features including AutoClaw (agent mode), AI Slides, Magic Design, Full-Stack coding, and Write Code. No API access included.
Free API Models
| Model | Context | Input Price | Output Price |
|---|---|---|---|
| GLM-4.7-Flash | 200K | Free | Free |
| GLM-4.6V-Flash | 128K | Free | Free |
| GLM-4.1V-Thinking-Flash | 64K | Free | Free |
| GLM-4V-Flash | 16K | Free | Free |
| CogView-3-Flash | - | Free | Free |
| CogVideoX-Flash | - | Free | Free |
Source: Zhipu AI – Pricing
API Pricing
Flagship Models (per 1M tokens, in CNY)
| Model | Context Tier | Input (¥/MTok) | Output (¥/MTok) | Cache Storage (¥/MTok/hr) | Cache Hit (¥/MTok) |
|---|---|---|---|---|---|
| GLM-5.1 | <32K | ¥6 | ¥24 | Free (limited-time) | ¥1.3 |
| GLM-5.1 | 32K+ | ¥8 | ¥28 | Free (limited-time) | ¥2 |
| GLM-5-Turbo | <32K | ¥5 | ¥22 | Free (limited-time) | ¥1.2 |
| GLM-5-Turbo | 32K+ | ¥7 | ¥26 | Free (limited-time) | ¥1.8 |
| GLM-5 | <32K | ¥4 | ¥18 | Free (limited-time) | ¥1 |
| GLM-5 | 32K+ | ¥6 | ¥22 | Free (limited-time) | ¥1.5 |
Mid-Tier and Economy Models (per 1M tokens, in CNY)
| Model | Context Tier | Input (¥/MTok) | Output (¥/MTok) | Cache Hit (¥/MTok) |
|---|---|---|---|---|
| GLM-4.7 | <32K, output <0.2K | ¥2 | ¥8 | ¥0.4 |
| GLM-4.7 | <32K, output 0.2K+ | ¥3 | ¥14 | ¥0.6 |
| GLM-4.7 | 32-200K | ¥4 | ¥16 | ¥0.8 |
| GLM-4.5-Air | <32K, output <0.2K | ¥0.8 | ¥2 | ¥0.16 |
| GLM-4.5-Air | <32K, output 0.2K+ | ¥0.8 | ¥6 | ¥0.16 |
| GLM-4.5-Air | 32-128K | ¥1.2 | ¥8 | ¥0.24 |
| GLM-4.7-FlashX | 200K | ¥0.5 | ¥3 | ¥0.1 |
| GLM-4.7-Flash | 200K | Free | Free | Free |
USD Approximate Pricing (at ~¥7.25/USD)
| Model | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| GLM-5.1 (<32K) | ~$0.83 | ~$3.31 |
| GLM-5.1 (32K+) | ~$1.10 | ~$3.86 |
| GLM-5-Turbo (<32K) | ~$0.69 | ~$3.03 |
| GLM-5 (<32K) | ~$0.55 | ~$2.48 |
| GLM-4.7 (<32K) | ~$0.28-$0.41 | ~$1.10-$1.93 |
| GLM-4.5-Air | ~$0.11 | ~$0.28-$0.83 |
| GLM-4.7-FlashX | ~$0.07 | ~$0.41 |
Batch API Pricing
50% discount on standard pricing for supported models (GLM-4 series). Source: Zhipu AI – Pricing
Private Instance Pricing
| Model | Deployment | Price |
|---|---|---|
| GLM-4.6 | 200K fp8 | ¥175/GPU unit/day |
| GLM-4.5 | 128K fp8 | ¥175/GPU unit/day |
| GLM-4.5-Air | 128K fp8 | ¥100/GPU unit/day |
Source: Zhipu AI – Pricing
Search Tools
| Tool | Price |
|---|---|
| Search-Std (Zhipu self-developed) | ¥0.01/request |
| Search-Pro (Zhipu enhanced) | ¥0.03/request |
| Search-Pro-Sogou | ¥0.05/request |
| Search-Pro-Quark | ¥0.05/request |
Source: Zhipu AI – Pricing
Model Performance / Benchmarks
| Model | Benchmark | Score | Notes |
|---|---|---|---|
| GLM-5.1 | SWE-Bench Pro | 58.4 | Surpassed GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro |
| GLM-5 | SWE-Bench Verified | 77.8 | Open-source SOTA |
| GLM-5 | Terminal Bench 2.0 | 56.2 | Open-source SOTA |
| GLM-5 | BrowseComp | Undisclosed | Open-source SOTA (web browsing/retrieval) |
| GLM-5 | MCP-Atlas | Undisclosed | Open-source SOTA (tool calling, multi-step tasks) |
| GLM-5 | tau2-Bench | Undisclosed | Open-source SOTA (complex multi-tool planning/execution) |
| GLM-5 | ZClawBench | Undisclosed | Significantly above GLM-5 (OpenClaw agent benchmark) |
| GLM-5-Turbo | ZClawBench | Undisclosed | Above GLM-5 and multiple mainstream models (OpenClaw benchmark) |
Key context: GLM-5 claims coding performance aligned with Claude Opus 4.5 (not the latest 4.6/4.8). GLM-5.1 claims alignment with Claude Opus 4.6. Zhipu publishes benchmark scores selectively (exact numbers for some, "open-source SOTA" claims without numbers for others). The GLM-5 series uses a 744B parameter MoE architecture with 40B active parameters, up from 355B/32B in the previous generation. GLM-5 integrates DeepSeek Sparse Attention for long-context efficiency.
Sources:
Latest News
GLM-5.1 Flagship Launch (April 7, 2026): Zhipu AI released GLM-5.1 as its latest flagship model. Key claims: coding capability aligned with Claude Opus 4.6, 8-hour long-horizon autonomous task execution (planning, execution, testing, delivery in a single session), SWE-Bench Pro score of 58.4 (claimed global best). The model achieved this through multi-turn SFT, RL, and process quality evaluation. Practical demonstrations include building a Linux desktop from scratch in 8 hours, 655-round iterative optimization achieving 6.9x throughput improvement on a vector database, and 3.6x geometric mean speedup on KernelBench Level 3 (vs torch.compile's 1.49x). Source: Bigmodel – New Releases
GLM-5V-Turbo Multimodal Coding Model (April 2, 2026): A new vision+coding model with 200K context and 128K output. Enhanced GUI Agent and Coding Agent performance for "see environment, plan actions, execute tasks" workflows. Adds visual tools: bounding box, screenshot, web page reading with image recognition. Source: Bigmodel – New Releases
GLM-5-Turbo OpenClaw Optimization (March 15, 2026): Purpose-built model for coding agent workflows. Enhanced tool calling, instruction following, time-aware task execution, and high-throughput long-chain processing. Released alongside ZClawBench, a new end-to-end agent benchmark for OpenClaw scenarios. Skills usage in OpenClaw grew from 26% to 45%. Source: Bigmodel – Glm 5 Turbo
Zhipu AI First Financial Report (March 31, 2026): As a publicly listed company (HKEX: 02513.HK), Zhipu published its first annual results. Specific financial figures not available in the fetched data. Source: Zhipuai – News
GLM-5 Launch (February 12, 2026): The GLM-5 base model launched with 744B parameters (40B active), 28.5T training tokens, and DeepSeek Sparse Attention integration. Positioned for "Agentic Engineering" with coding capability aligned to Claude Opus 4.5. Source: Bigmodel – New Releases
GLM Coding Plan Demand Surge (Ongoing): Zhipu implemented daily purchase limits on Coding Plan subscriptions starting January 23 due to demand exceeding capacity. New inventory released daily at 10:00 UTC+8. Existing subscribers are unaffected. Source: Bigmodel – Overview
Community Signals
Coding tool ecosystem endorsements: Multiple coding tool vendors provided public endorsements for the GLM Coding Plan. Kilo Code highlighted the combination of generous quotas and low cost as removing cost anxiety. Cline praised the pricing and quota structure as "hard to beat" for developers seeking high-value AI coding. Crush (which used GLM for key architecture) and Factory both endorsed GLM-5's performance-to-cost ratio. Source: Zhipu AI – Glm Coding
Coding Plan purchase limits signal capacity strain: The platform has been rationing Coding Plan subscriptions since January 2026, releasing limited stock daily. This indicates demand significantly outpacing GPU capacity, which is a risk for users who need predictable access. The rationing notice states the team is "working at highest priority to coordinate resources." Source: Bigmodel – Overview
Peak-hour multiplier controversy: The 3x multiplier for GLM-5.1 during peak hours (14:00-18:00 UTC+8) effectively reduces the value of Coding Plan subscriptions during the most common working hours for Chinese developers. The temporary 1x off-peak promotion through June 2026 is clearly an acquisition strategy. Users should expect the full multiplier to apply after the promotion ends, which would significantly change the value equation for GLM-5.1 usage. Source: Bigmodel – Overview
No significant English-language community presence: Reddit (r/LocalLLaMA) and Hacker News show minimal discussion of Zhipu AI's GLM models in May 2026. The primary community engagement is through Chinese-language channels (Feishu/Lark groups, Chinese social media). This limits the availability of independent quality assessments and user reports for non-Chinese-speaking evaluators.
Enterprise Readiness
| Feature | Available? | Details |
|---|---|---|
| SSO (SAML/OIDC) | Undisclosed | Not mentioned in public documentation |
| SCIM | Undisclosed | Not mentioned in public documentation |
| Audit logs | Undisclosed | Not mentioned in public documentation |
| IP indemnity | Undisclosed | Not mentioned in public documentation. A commercial license agreement exists for model use. Source: Bigmodel – Model Commercial Use |
| Data residency | Partial | Cloud private instances and on-premise deployment available. On-prem pricing listed as "数千万" (tens of millions) for GLM-4-0520. Source: Zhipu AI – Pricing |
| HIPAA | No | Not mentioned |
| Air-gapped / On-prem | Yes | Local private deployment available for GLM-4 series with hardware appliance. Pricing starts from "数十万" (hundreds of thousands CNY) for smaller models. Source: Zhipu AI – Pricing |
| SLA | Undisclosed | Not mentioned in public documentation |
| Admin controls (RBAC) | Partial | Team plan available with central billing. Source: Bigmodel – Team |
| Content security / moderation | Yes | Built-in content safety audit for text, image, audio, video. Source: Bigmodel – Securityaudit |
| Model fine-tuning | Yes | LoRA and full fine-tuning supported on GLM-4.5, GLM-4.5-Air, GLM-4 series. Source: Zhipu AI – Pricing |
| OpenAI API compatibility | Yes | Supports OpenAI SDK, Claude API compatibility, LangChain, HTTP API, Python SDK, Java SDK. Source: Bigmodel – Introduction |
Team Plan: GLM Coding Plan offers team tiers. Specific pricing and per-seat structure available in the team documentation. Source: Bigmodel – Team
Cloud private instances: Available at ¥100-175/GPU unit/day for dedicated model deployments. Annual套餐 available (e.g., GLM-4.5 at ¥1.1M/year, GLM-4.5-Air at ¥500K/year). Source: Zhipu AI – Pricing
Transparency Gaps
- Exact rate limits undisclosed. The platform uses dynamic rate limiting based on user tier, subscription level, and current load. No specific RPM/TPM numbers are published. Users must check the console for their current limits. Source: Bigmodel – Rate Limit
- Coding Plan quota inexact. Quotas are described as "approximately X prompts" with the caveat that actual usage varies by "project complexity, codebase size, and whether auto-accept is enabled." There is no token-level accounting exposed to users. Source: Bigmodel – Overview
- Peak-hour multiplier end date. The 1x off-peak promotion for GLM-5.1/GLM-5-Turbo runs "through end of June" but the full multiplier (2x off-peak, 3x peak) is the stated permanent rate. Users cannot plan long-term costs without knowing whether the promotion will be extended. Source: Bigmodel – Overview
- Purchase limit duration. The daily rationing of Coding Plan subscriptions has no stated end date. The notice says "short-term" but has been in effect since January 2026 (5+ months). Source: Bigmodel – Overview
- Benchmark scores partially disclosed. GLM-5 claims "open-source SOTA" on BrowseComp, MCP-Atlas, and tau2-Bench without publishing exact numbers. GLM-5.1's SWE-Bench Pro 58.4 is the only fully quantified coding benchmark. Source: Bigmodel – Glm 5
- Model architecture details. GLM-5/5.1 is described as 744B total / 40B active parameters, but training data composition, architecture specifics beyond "DeepSeek Sparse Attention," and inference optimization details are not fully disclosed.
- Enterprise features. SSO, SCIM, audit logs, IP indemnity, and SLA details are not documented publicly. Enterprises must contact sales for this information. Source: Zhipu AI – Pricing
- Concurrent request limits. Coding Plan documentation recommends project counts (Lite: 1, Pro: 1-2, Max: 2+) but does not state actual concurrent request limits. The documentation acknowledges users sometimes "feel like only 1 concurrent request" during peak hours. Source: Bigmodel – Rate Limit
- Cache storage pricing. Currently listed as "free (limited-time)" for all flagship models. The permanent pricing is undisclosed. Source: Zhipu AI – Pricing
- GLM-4.7 output-tier pricing. GLM-4.7 has a unique pricing structure where output cost depends on output length (<0.2K or 0.2K+ tokens). This is unusual and not clearly explained in documentation. Source: Zhipu AI – Pricing