Zhipu AI

AI Coding Agents Report: May 2026 · Updated 31 May 2026 · Version history

Executive Summary

What it is: Zhipu AI (listed on HKEX as 02513.HK) is a Chinese LLM supplier founded by a Tsinghua University team. Its consumer product, z.ai, offers free chat powered by GLM-5.1 and GLM-5. The developer platform, bigmodel.cn, provides pay-per-token API access across a wide model lineup (text, vision, image generation, video, audio). The GLM Coding Plan is a subscription-based coding agent service compatible with Claude Code, OpenClaw, OpenCode, Kilo Code, Cline, TRAE, CodeBuddy, and 20+ other coding tools. Personal plans range from ¥49/month (Lite) to ¥469/month (Max). API pricing for the flagship GLM-5.1 is ¥6/¥24 per MTok (input/output at <32K context), making it one of the cheapest frontier-grade models available.

What to watch out for: The GLM Coding Plan operates on a quota system with 5-hour rolling windows and weekly caps, not flat monthly usage. GLM-5.1 and GLM-5-Turbo consume quota at 2x-3x multiplier during peak hours (14:00-18:00 UTC+8), though a temporary promotion reduces this to 1x off-peak through June 2026. The Coding Plan is restricted to approved coding tools only; using it outside these tools can result in account restrictions. Rate limits and concurrency are dynamic and not publicly disclosed in exact numbers. The platform has implemented purchase limits on Coding Plan subscriptions due to capacity constraints, which may still be in effect. Documentation is primarily in Chinese, which may be a barrier for non-Chinese-speaking teams.

Bottom line: Zhipu AI offers the most cost-effective frontier model API pricing among all tracked suppliers, with GLM-5.1 at roughly $0.83/$3.33 per MTok at current exchange rates, which is 6x cheaper than Claude Opus 4.8 on input and 7.5x cheaper on output. The GLM Coding Plan at ¥149/month (Pro) provides generous usage for agentic coding through industry-standard tools. The main risks are capacity constraints (purchase limits, dynamic rate limiting), China-centric documentation and support, and the model multiplier system that can inflate costs during peak hours.

Key Terms

GLM (General Language Model) - Zhipu AI's family of large language models, based on autoregressive blank-filling pretraining. The ChatGLM series supports complex natural language instructions and reasoning. Source: Bigmodel – Introduction
OpenClaw - Zhipu AI's branding for agentic coding workflows. The term "lobster" (龙虾) is used colloquially in Chinese documentation for coding agent tasks. GLM-5-Turbo is specifically optimized for OpenClaw scenarios. Source: Bigmodel – Glm 5 Turbo
GLM Coding Plan - A subscription service for AI-powered coding. Supports 20+ coding tools including Claude Code, OpenClaw, OpenCode, Kilo Code, Cline. Billed on 5-hour rolling windows and weekly quotas, not flat monthly tokens. Source: Bigmodel – Overview
Token-based billing - API usage is charged per million tokens. GLM series models use approximately 1 token per 1.6 Chinese characters. Pricing tiers based on input context length (<32K tokens vs 32K+ tokens). Source: Bigmodel – Introduction
Prompt caching - Context caching is available for GLM models. Cache storage is currently free (limited-time promotion). Cache hits are billed at reduced rates (e.g., ¥1.3/MTok for GLM-5.1 vs ¥6/MTok full input). Source: Zhipu AI – Pricing
Batch API - Asynchronous processing at 50% discount. Supported on older GLM-4 series models. Source: Zhipu AI – Pricing
Context window - Maximum tokens the model processes in one conversation. GLM-5.1 and GLM-5 support 200K context with up to 128K max output. GLM-4-Long supports up to 1M context. Source: Bigmodel – Model Overview
Thinking mode - Chain-of-thought reasoning enabled via thinking: { type: "enabled" }. Supported on GLM-5.1, GLM-5, GLM-5-Turbo. Temperature defaults to 1.0 when thinking is enabled. Source: Bigmodel – Thinking Mode
MCP servers - Model Context Protocol servers provided with the Coding Plan, including vision understanding, web search, web page reading, and open source repository reading. Source: Bigmodel – Overview
Model multiplier - GLM-5.1 and GLM-5-Turbo consume Coding Plan quota at 3x during peak hours (14:00-18:00 UTC+8) and 2x off-peak. A promotion running through June 2026 reduces off-peak to 1x. Source: Bigmodel – Overview

Latest Changes

Changes since the 2026-04 report.

New model: GLM-5.1 launched April 7, 2026 as the latest flagship. Coding capability aligned with Claude Opus 4.6. SWE-Bench Pro score of 58.4, surpassing GPT-5.4 and Claude Opus 4.6. Supports 8-hour long-horizon autonomous tasks. 200K context, 128K max output. Source: Bigmodel – New Releases
New model: GLM-5V-Turbo launched April 2, 2026. Multimodal coding model combining vision understanding with coding capability. 200K context, 128K max output. Source: Bigmodel – New Releases
New model: GLM-5-Turbo launched March 15, 2026. OpenClaw-optimized base model with enhanced tool calling, instruction following, and long-duration task execution. 200K context, 128K max output. Source: Bigmodel – New Releases
Feature added: GLM Coding Plan now supports GLM-5.1 across all tiers (Lite, Pro, Max). Source: Bigmodel – Overview
Promotion: GLM-5.1 and GLM-5-Turbo consume Coding Plan quota at 1x (instead of 2x) during off-peak hours, running through end of June 2026. Source: Bigmodel – Overview
Feature added: GLM in Excel (Beta) included in Coding Plan subscriptions. Source: Bigmodel – Overview
Milestone: Zhipu AI published its first financial results report as a publicly listed company (HKEX: 02513.HK) on March 31, 2026. Source: Zhipuai – News
Deprecation: GLM-Z1 series scheduled for deprecation on November 15, 2025 (already deprecated). GLM-4-0520 scheduled for December 30, 2025 deprecation. Source: Bigmodel – Model Overview

Plans

GLM Coding Plan (Personal)

Plan	Monthly Price	Quarterly Price (per month)	5-Hour Quota (prompts)	Weekly Quota (prompts)	MCP Calls/Month	Recommended Projects
Lite	¥49	¥44.1 (9% off)	~80	~400	100	1 small repo
Pro	¥149	¥134.1 (9% off)	~400	~2,000	1,000	1-2 mid-size repos
Max	¥469	¥422.1 (9% off)	~1,600	~8,000	4,000	2+ large repos

Annual subscriptions available at 20% discount.

What's included in all plans:

Models: GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air
MCP tools: Vision understanding, web search, web page reading, open source repo reading
Compatible tools: Claude Code, OpenClaw, OpenCode, Kilo Code, Cline, TRAE, CodeBuddy, and 20+ others
GLM in Excel (Beta)

Model multiplier for Coding Plan:

GLM-5.1 / GLM-5-Turbo: 3x during peak (14:00-18:00 UTC+8), 2x off-peak (promotional: 1x off-peak through June 2026)
GLM-4.7 / GLM-4.5-Air: 1x at all times

Monthly value estimate: Each plan provides API-equivalent value of 15-30x the monthly subscription price (accounting for weekly quota limits).

Purchase limits: The platform has implemented daily purchase limits on Coding Plan subscriptions due to demand exceeding capacity. Limits are released daily at 10:00 UTC+8. Existing subscribers renewing or upgrading are not affected. Source: Bigmodel – Overview

Free Tier (z.ai)

The z.ai consumer chatbot provides free access to GLM-5.1 and GLM-5 via web interface with features including AutoClaw (agent mode), AI Slides, Magic Design, Full-Stack coding, and Write Code. No API access included.

Free API Models

Model	Context	Input Price	Output Price
GLM-4.7-Flash	200K	Free	Free
GLM-4.6V-Flash	128K	Free	Free
GLM-4.1V-Thinking-Flash	64K	Free	Free
GLM-4V-Flash	16K	Free	Free
CogView-3-Flash	-	Free	Free
CogVideoX-Flash	-	Free	Free

Source: Zhipu AI – Pricing

API Pricing

Flagship Models (per 1M tokens, in CNY)

Model	Context Tier	Input (¥/MTok)	Output (¥/MTok)	Cache Storage (¥/MTok/hr)	Cache Hit (¥/MTok)
GLM-5.1	<32K	¥6	¥24	Free (limited-time)	¥1.3
GLM-5.1	32K+	¥8	¥28	Free (limited-time)	¥2
GLM-5-Turbo	<32K	¥5	¥22	Free (limited-time)	¥1.2
GLM-5-Turbo	32K+	¥7	¥26	Free (limited-time)	¥1.8
GLM-5	<32K	¥4	¥18	Free (limited-time)	¥1
GLM-5	32K+	¥6	¥22	Free (limited-time)	¥1.5

Mid-Tier and Economy Models (per 1M tokens, in CNY)

Model	Context Tier	Input (¥/MTok)	Output (¥/MTok)	Cache Hit (¥/MTok)
GLM-4.7	<32K, output <0.2K	¥2	¥8	¥0.4
GLM-4.7	<32K, output 0.2K+	¥3	¥14	¥0.6
GLM-4.7	32-200K	¥4	¥16	¥0.8
GLM-4.5-Air	<32K, output <0.2K	¥0.8	¥2	¥0.16
GLM-4.5-Air	<32K, output 0.2K+	¥0.8	¥6	¥0.16
GLM-4.5-Air	32-128K	¥1.2	¥8	¥0.24
GLM-4.7-FlashX	200K	¥0.5	¥3	¥0.1
GLM-4.7-Flash	200K	Free	Free	Free

USD Approximate Pricing (at ~¥7.25/USD)

Model	Input ($/MTok)	Output ($/MTok)
GLM-5.1 (<32K)	~$0.83	~$3.31
GLM-5.1 (32K+)	~$1.10	~$3.86
GLM-5-Turbo (<32K)	~$0.69	~$3.03
GLM-5 (<32K)	~$0.55	~$2.48
GLM-4.7 (<32K)	~$0.28-$0.41	~$1.10-$1.93
GLM-4.5-Air	~$0.11	~$0.28-$0.83
GLM-4.7-FlashX	~$0.07	~$0.41

Batch API Pricing

50% discount on standard pricing for supported models (GLM-4 series). Source: Zhipu AI – Pricing

Private Instance Pricing

Model	Deployment	Price
GLM-4.6	200K fp8	¥175/GPU unit/day
GLM-4.5	128K fp8	¥175/GPU unit/day
GLM-4.5-Air	128K fp8	¥100/GPU unit/day

Source: Zhipu AI – Pricing

Search Tools

Tool	Price
Search-Std (Zhipu self-developed)	¥0.01/request
Search-Pro (Zhipu enhanced)	¥0.03/request
Search-Pro-Sogou	¥0.05/request
Search-Pro-Quark	¥0.05/request

Source: Zhipu AI – Pricing

Model Performance / Benchmarks

Model	Benchmark	Score	Notes
GLM-5.1	SWE-Bench Pro	58.4	Surpassed GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro
GLM-5	SWE-Bench Verified	77.8	Open-source SOTA
GLM-5	Terminal Bench 2.0	56.2	Open-source SOTA
GLM-5	BrowseComp	Undisclosed	Open-source SOTA (web browsing/retrieval)
GLM-5	MCP-Atlas	Undisclosed	Open-source SOTA (tool calling, multi-step tasks)
GLM-5	tau2-Bench	Undisclosed	Open-source SOTA (complex multi-tool planning/execution)
GLM-5	ZClawBench	Undisclosed	Significantly above GLM-5 (OpenClaw agent benchmark)
GLM-5-Turbo	ZClawBench	Undisclosed	Above GLM-5 and multiple mainstream models (OpenClaw benchmark)

Key context: GLM-5 claims coding performance aligned with Claude Opus 4.5 (not the latest 4.6/4.8). GLM-5.1 claims alignment with Claude Opus 4.6. Zhipu publishes benchmark scores selectively (exact numbers for some, "open-source SOTA" claims without numbers for others). The GLM-5 series uses a 744B parameter MoE architecture with 40B active parameters, up from 355B/32B in the previous generation. GLM-5 integrates DeepSeek Sparse Attention for long-context efficiency.

Sources:

Latest News

GLM-5.1 Flagship Launch (April 7, 2026): Zhipu AI released GLM-5.1 as its latest flagship model. Key claims: coding capability aligned with Claude Opus 4.6, 8-hour long-horizon autonomous task execution (planning, execution, testing, delivery in a single session), SWE-Bench Pro score of 58.4 (claimed global best). The model achieved this through multi-turn SFT, RL, and process quality evaluation. Practical demonstrations include building a Linux desktop from scratch in 8 hours, 655-round iterative optimization achieving 6.9x throughput improvement on a vector database, and 3.6x geometric mean speedup on KernelBench Level 3 (vs torch.compile's 1.49x). Source: Bigmodel – New Releases

GLM-5V-Turbo Multimodal Coding Model (April 2, 2026): A new vision+coding model with 200K context and 128K output. Enhanced GUI Agent and Coding Agent performance for "see environment, plan actions, execute tasks" workflows. Adds visual tools: bounding box, screenshot, web page reading with image recognition. Source: Bigmodel – New Releases

GLM-5-Turbo OpenClaw Optimization (March 15, 2026): Purpose-built model for coding agent workflows. Enhanced tool calling, instruction following, time-aware task execution, and high-throughput long-chain processing. Released alongside ZClawBench, a new end-to-end agent benchmark for OpenClaw scenarios. Skills usage in OpenClaw grew from 26% to 45%. Source: Bigmodel – Glm 5 Turbo

Zhipu AI First Financial Report (March 31, 2026): As a publicly listed company (HKEX: 02513.HK), Zhipu published its first annual results. Specific financial figures not available in the fetched data. Source: Zhipuai – News

GLM-5 Launch (February 12, 2026): The GLM-5 base model launched with 744B parameters (40B active), 28.5T training tokens, and DeepSeek Sparse Attention integration. Positioned for "Agentic Engineering" with coding capability aligned to Claude Opus 4.5. Source: Bigmodel – New Releases

GLM Coding Plan Demand Surge (Ongoing): Zhipu implemented daily purchase limits on Coding Plan subscriptions starting January 23 due to demand exceeding capacity. New inventory released daily at 10:00 UTC+8. Existing subscribers are unaffected. Source: Bigmodel – Overview

Community Signals

Coding tool ecosystem endorsements: Multiple coding tool vendors provided public endorsements for the GLM Coding Plan. Kilo Code highlighted the combination of generous quotas and low cost as removing cost anxiety. Cline praised the pricing and quota structure as "hard to beat" for developers seeking high-value AI coding. Crush (which used GLM for key architecture) and Factory both endorsed GLM-5's performance-to-cost ratio. Source: Zhipu AI – Glm Coding

Coding Plan purchase limits signal capacity strain: The platform has been rationing Coding Plan subscriptions since January 2026, releasing limited stock daily. This indicates demand significantly outpacing GPU capacity, which is a risk for users who need predictable access. The rationing notice states the team is "working at highest priority to coordinate resources." Source: Bigmodel – Overview

Peak-hour multiplier controversy: The 3x multiplier for GLM-5.1 during peak hours (14:00-18:00 UTC+8) effectively reduces the value of Coding Plan subscriptions during the most common working hours for Chinese developers. The temporary 1x off-peak promotion through June 2026 is clearly an acquisition strategy. Users should expect the full multiplier to apply after the promotion ends, which would significantly change the value equation for GLM-5.1 usage. Source: Bigmodel – Overview

No significant English-language community presence: Reddit (r/LocalLLaMA) and Hacker News show minimal discussion of Zhipu AI's GLM models in May 2026. The primary community engagement is through Chinese-language channels (Feishu/Lark groups, Chinese social media). This limits the availability of independent quality assessments and user reports for non-Chinese-speaking evaluators.

Enterprise Readiness

Feature	Available?	Details
SSO (SAML/OIDC)	Undisclosed	Not mentioned in public documentation
SCIM	Undisclosed	Not mentioned in public documentation
Audit logs	Undisclosed	Not mentioned in public documentation
IP indemnity	Undisclosed	Not mentioned in public documentation. A commercial license agreement exists for model use. Source: Bigmodel – Model Commercial Use
Data residency	Partial	Cloud private instances and on-premise deployment available. On-prem pricing listed as "数千万" (tens of millions) for GLM-4-0520. Source: Zhipu AI – Pricing
HIPAA	No	Not mentioned
Air-gapped / On-prem	Yes	Local private deployment available for GLM-4 series with hardware appliance. Pricing starts from "数十万" (hundreds of thousands CNY) for smaller models. Source: Zhipu AI – Pricing
SLA	Undisclosed	Not mentioned in public documentation
Admin controls (RBAC)	Partial	Team plan available with central billing. Source: Bigmodel – Team
Content security / moderation	Yes	Built-in content safety audit for text, image, audio, video. Source: Bigmodel – Securityaudit
Model fine-tuning	Yes	LoRA and full fine-tuning supported on GLM-4.5, GLM-4.5-Air, GLM-4 series. Source: Zhipu AI – Pricing
OpenAI API compatibility	Yes	Supports OpenAI SDK, Claude API compatibility, LangChain, HTTP API, Python SDK, Java SDK. Source: Bigmodel – Introduction

Team Plan: GLM Coding Plan offers team tiers. Specific pricing and per-seat structure available in the team documentation. Source: Bigmodel – Team

Cloud private instances: Available at ¥100-175/GPU unit/day for dedicated model deployments. Annual套餐 available (e.g., GLM-4.5 at ¥1.1M/year, GLM-4.5-Air at ¥500K/year). Source: Zhipu AI – Pricing

Transparency Gaps

Exact rate limits undisclosed. The platform uses dynamic rate limiting based on user tier, subscription level, and current load. No specific RPM/TPM numbers are published. Users must check the console for their current limits. Source: Bigmodel – Rate Limit

Coding Plan quota inexact. Quotas are described as "approximately X prompts" with the caveat that actual usage varies by "project complexity, codebase size, and whether auto-accept is enabled." There is no token-level accounting exposed to users. Source: Bigmodel – Overview

Peak-hour multiplier end date. The 1x off-peak promotion for GLM-5.1/GLM-5-Turbo runs "through end of June" but the full multiplier (2x off-peak, 3x peak) is the stated permanent rate. Users cannot plan long-term costs without knowing whether the promotion will be extended. Source: Bigmodel – Overview

Purchase limit duration. The daily rationing of Coding Plan subscriptions has no stated end date. The notice says "short-term" but has been in effect since January 2026 (5+ months). Source: Bigmodel – Overview

Benchmark scores partially disclosed. GLM-5 claims "open-source SOTA" on BrowseComp, MCP-Atlas, and tau2-Bench without publishing exact numbers. GLM-5.1's SWE-Bench Pro 58.4 is the only fully quantified coding benchmark. Source: Bigmodel – Glm 5

Model architecture details. GLM-5/5.1 is described as 744B total / 40B active parameters, but training data composition, architecture specifics beyond "DeepSeek Sparse Attention," and inference optimization details are not fully disclosed.

Enterprise features. SSO, SCIM, audit logs, IP indemnity, and SLA details are not documented publicly. Enterprises must contact sales for this information. Source: Zhipu AI – Pricing

Concurrent request limits. Coding Plan documentation recommends project counts (Lite: 1, Pro: 1-2, Max: 2+) but does not state actual concurrent request limits. The documentation acknowledges users sometimes "feel like only 1 concurrent request" during peak hours. Source: Bigmodel – Rate Limit

Cache storage pricing. Currently listed as "free (limited-time)" for all flagship models. The permanent pricing is undisclosed. Source: Zhipu AI – Pricing

GLM-4.7 output-tier pricing. GLM-4.7 has a unique pricing structure where output cost depends on output length (<0.2K or 0.2K+ tokens). This is unusual and not clearly explained in documentation. Source: Zhipu AI – Pricing

Type: CLI, API
API Input: $0.07/MTok
API Output: $0.41/MTok
Context: 200K
Free Tier: Yes

Compare all suppliers →