Zhipu AI

AI Coding Agents Report: April 2026 · Updated 01 May 2026 · Version history

Executive Summary

What it is: Zhipu AI's GLM is a Chinese AI platform offering coding agent support through a GLM Coding Plan (compatible with Claude Code, Cursor, Cline, and other tools) and API access. The flagship model, GLM-5.1 (744B parameters, 40B active), claims performance aligned with Claude Opus 4.6 and supports up to 8 hours of continuous autonomous work. A free web chat is available at z.ai.

What to watch out for: GLM Coding Plan prices are $18/mo (Lite), $72/mo (Pro), and $160/mo (Max), with a 10% discount for quarterly billing. Prices were doubled in April 2026. GLM-5.1 consumes 3x quota during peak hours (14:00 to 18:00 UTC+8), and the promotional 1x off-peak multiplier expires at the end of June 2026. Plans are in "short-term sales restriction" with daily inventory limits released at 10:00 UTC+8.

Bottom line: GLM-5.1 at $1.40/$4.40 per MTok (input/output) is cheaper than Claude Opus 4.7 ($5/$25) and GPT-5.4 ($2.50/$10.00) while claiming competitive SWE-Bench Pro scores (58.4 vs 53.4 for Opus 4.6). The Coding Plan at $18-$160/mo offers strong value if the 15-30x API value claim holds, but the April price doubling and expiring promotional 1x off-peak multiplier mean costs will rise. The 8-hour continuous task capability is unmatched. Best suited for teams wanting multi-provider resilience or cost savings on high-volume agentic coding.

Key Terms

GLM Coding Plan - Zhipu AI's subscription service for using GLM models in coding agents. Supports Claude Code, Cline, OpenCode, Roo Code, Kilo Code, Cursor, Crush, and Goose. Uses a dedicated API endpoint separate from the general API. Source: Bigmodel – Overview
GLM-5.1 - Zhipu AI's latest flagship model (April 2026). Claims to align with Claude Opus 4.6 in coding ability. 200K context, 128K max output. Supports thinking mode, function calling, context caching, structured output, and MCP. Source: Bigmodel – Glm 5.1
Long-horizon tasks - GLM-5.1's headline capability: the model can work autonomously for up to 8 hours in a single task, performing planning, execution, testing, and iteration. Source: Bigmodel – Glm 5.1
Peak multiplier - GLM-5.1 and GLM-5-Turbo consume 3x quota during peak hours (14:00-18:00 UTC+8) and 2x during off-peak. As a promotional offer valid through end of June 2026, off-peak usage counts as 1x. Source: Bigmodel – Overview
MCP servers - GLM Coding Plan includes exclusive MCP servers for vision understanding, web search, web page reading, and open-source repository access. Source: Bigmodel – Overview

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

New model: GLM-5.1 launched April 8-9. 744B parameters (40B active), MoE. Claims alignment with Claude Opus 4.6.
Plan change: GLM Coding Plan prices doubled April 14. Lite: $18/mo, Pro: $72/mo, Max: $160/mo.
Plan change: Legacy subscription plans being phased out in favor of new Lite/Pro/Max tier structure (April 23).
Feature added: GLM-5.1 supports up to 8 hours of continuous autonomous work in a single task.
Feature added: Promotional 1x off-peak multiplier for GLM-5.1 through end of June 2026.
Plan change: Plans in "short-term sales restriction" with daily inventory limits released at 10:00 UTC+8.

Plans

GLM Coding Plan (Subscription)

Plan	5-Hour Limit	Weekly Limit	MCP Calls/mo	Recommended Projects	Price (monthly)	Price (quarterly, 10% off)
Lite	~80 prompts	~400 prompts	100	1 project	$18/mo	$16.20/mo ($48.60/quarter)
Pro	~400 prompts	~2,000 prompts	1,000	1-2 projects	$72/mo	$64.80/mo ($194.40/quarter)
Max	~1,600 prompts	~8,000 prompts	4,000	2+ projects	$160/mo	$144/mo ($432/quarter)

Each prompt triggers approximately 15-20 model calls. Monthly value is claimed to be 15-30x the subscription cost at API rates. Plans are currently in short-term sales restriction mode: limited inventory released daily at 10:00 UTC+8. Renewals and upgrades are not affected.

Available models: GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air. GLM-5.1 is recommended for complex tasks; GLM-4.7 for routine work to conserve quota.

Source: Bigmodel – Overview, Z – Subscribe

API (Pay-as-you-go)

The general API at open.bigmodel.cn is separate from the Coding Plan. Free models available: GLM-4.7-Flash, GLM-4.5-Flash (no token cost). All prices below are per 1M tokens.

Source: Z – Overview

z.ai Chat (Consumer)

Free web-based chatbot at z.ai powered by GLM-5.1 and GLM-5. No subscription required.

Source: Z

API Pricing

Text Models ($/MTok)

Model	Input	Cached Input	Cached Input Storage	Output
GLM-5.1	$1.40	$0.26	Limited-time Free	$4.40
GLM-5	$1.00	$0.20	Limited-time Free	$3.20
GLM-5-Turbo	$1.20	$0.24	Limited-time Free	$4.00
GLM-4.7	$0.60	$0.11	Limited-time Free	$2.20
GLM-4.7-FlashX	$0.07	$0.01	Limited-time Free	$0.40
GLM-4.6	$0.60	$0.11	Limited-time Free	$2.20
GLM-4.5	$0.60	$0.11	Limited-time Free	$2.20
GLM-4.5-X	$2.20	$0.45	Limited-time Free	$8.90
GLM-4.5-Air	$0.20	$0.03	Limited-time Free	$1.10
GLM-4.5-AirX	$1.10	$0.22	Limited-time Free	$4.50
GLM-4-32B-0414-128K	$0.10	-	-	$0.10
GLM-4.7-Flash	Free	Free	Free	Free
GLM-4.5-Flash	Free	Free	Free	Free

Vision Models ($/MTok)

Model	Input	Cached Input	Output
GLM-5V-Turbo	$1.20	$0.24	$4.00
GLM-4.6V	$0.30	$0.05	$0.90
GLM-OCR	$0.03	-	$0.03
GLM-4.6V-FlashX	$0.04	$0.004	$0.40
GLM-4.5V	$0.60	$0.11	$1.80
GLM-4.6V-Flash	Free	Free	Free

Built-in Tools

Tool	Cost
Web Search	$0.01/use

Image Generation (per image)

Model	Price
GLM-Image	$0.015
CogView-4	$0.01

Video Generation (per video)

Model	Price
CogVideoX-3	$0.20
ViduQ1-Text	$0.40
ViduQ1-Image	$0.40

Source: Z – Overview

Model Performance / Benchmarks

Benchmark	GLM-5.1	GPT-5.4	Claude Opus 4.6	Gemini 3.1 Pro
SWE-Bench Pro	58.4	57.7	53.4	54.2

Additional GLM-5.1 capabilities:

8-hour continuous autonomous work (building a complete Linux desktop, 655-round vector database optimization to 6.9x throughput)
KernelBench Level 3: 3.6x geometric mean speedup over torch.compile max-autotune
200K context window, 128K max output

Source: Bigmodel – Glm 5.1

Latest News

GLM-5.1 Launch (April 8-9, 2026)

Zhipu AI released GLM-5.1, the latest flagship model with significant coding and long-horizon task improvements:

SWE-Bench Pro: 58.4 (claimed above GPT-5.4 at 57.7, Opus 4.6 at 53.4, Gemini 3.1 Pro at 54.2)
Claims alignment with Claude Opus 4.6 in comprehensive and coding capabilities
Long-horizon task capability: up to 8 hours of continuous autonomous work in a single task
200K context window, 128K max output
Demonstrated building a complete Linux desktop system in 8 hours
Demonstrated 655-round iteration optimizing a vector database to 6.9x throughput
KernelBench Level 3: 3.6x geometric mean speedup over torch.compile max-autotune
744B parameters (40B active), MoE architecture (carried from GLM-5)
Supports thinking mode, function calling, context caching, structured output, MCP
HN: 618 points, 263 comments

Source: Bigmodel – Glm 5.1, News – From

GLM Coding Plan Price Increase (April 14, 2026)

Zhipu AI doubled the GLM Coding Plan prices. HN: 18 points, 6 comments. This followed the GLM-5.1 launch and the promotional 1x multiplier for GLM-5.1 usage (off-peak) valid through end of June 2026.

Source: News – From

Legacy Plan Migration (April 23, 2026)

Zhipu AI began phasing out original subscription plans in favor of the new Lite/Pro/Max tier structure. Users on legacy plans are being migrated. HN: 4 points.

Source: News – From

GLM-5 Turbo for OpenClaw (March 2026)

GLM-5-Turbo released, optimized for the OpenClaw (persistent agent) scenario. Improved complex long-task execution continuity.

Source: Bigmodel – Model Overview

Community Signals

GLM-5 Launch (January 2026, ongoing relevance)

HN: 484 points, 520 comments (largest Zhipu thread). GLM-5 was positioned as an open-source SOTA model with coding capabilities aligned to Claude Opus 4.5. Community noted the 744B MoE architecture and open-source availability.

Source: News – From

GLM Coding Plan Adoption

GLM Coding Plan supports Claude Code, Cursor, Cline, OpenCode, Roo Code, Kilo Code, and other tools
Community documentation shows step-by-step guides for using GLM models with Claude Code via Anthropic-compatible proxy API
Zhipu provides a npx @z_ai/coding-helper tool for automatic configuration
HN: "GLM 4.5 with Claude Code" thread (213 points, 84 comments) showed early adoption interest

Source: News – From

Price Sensitivity

"Z.ai doubles it's coding plan prices" (HN, 18 points): community reacted negatively to the price increase following the GLM-5.1 launch
The promotional 1x multiplier for GLM-5.1 (off-peak, through June 2026) suggests Zhipu is trying to drive adoption of the new model while managing compute costs

Sales Restriction

The platform implemented daily inventory limits for new subscriptions due to "user volume surge exceeding expectations." This suggests either genuine demand or capacity constraints.

Source: Bigmodel – Overview

Enterprise Readiness

Feature	Available?	Details
SSO (SAML)	No	Not mentioned. GLM Coding Plan uses API keys.
SSO (OIDC)	No	Not mentioned.
SCIM	No	Not mentioned.
Audit logs	No	Not mentioned.
IP indemnity	No	Not mentioned.
Data residency	No	Not mentioned. API endpoints are China-focused.
HIPAA	No	Not mentioned.
Air-gapped / on-prem	No	Not available.
SLA	No	No published SLA.
Admin controls (RBAC)	No	No admin controls documented. Plans are single-user.

Transparency Gaps

Gap	Details	Severity
Prompt count is approximate	Plan limits use "approximately X prompts" where each prompt triggers 15-20 model calls. Actual consumption depends on task complexity, making cost estimation unreliable.	Medium
Peak multiplier creates uncertainty	GLM-5.1 consumes 3x quota during peak hours (14:00-18:00 UTC+8) and 2x off-peak. The promotional 1x rate (through June) will increase effective costs when it expires.	Medium
Dynamic concurrency limits	Concurrency is "dynamically adjusted" based on resource availability. Max users get priority, but specific concurrency numbers are not published.	Medium
Usage restrictions enforced by risk control	The platform monitors for "improper use" including account sharing and use in non-approved tools. Violations trigger throttling, freezing, or banning. The detection methodology is not disclosed.	Low
No batch pricing published	Batch API is mentioned as available but pricing is not documented.	Low
Model parameter count not disclosed for GLM-5.1	GLM-5 is documented as 744B (40B active), but GLM-5.1's architecture is not specified.	Low

Type: API only
Starts at: $18.0/mo
API Input: $0.07/MTok
API Output: $0.4/MTok
Context: 200K
Free Tier: Yes

Compare all suppliers →