Zhipu AI

Executive Summary

What it is: Zhipu AI's GLM is a Chinese AI platform offering coding agent support through a GLM Coding Plan (compatible with Claude Code, Cursor, Cline, and other tools) and API access. The flagship model, GLM-5.1 (744B parameters, 40B active), claims performance aligned with Claude Opus 4.6 and supports up to 8 hours of continuous autonomous work. A free web chat is available at z.ai.

What to watch out for: GLM Coding Plan prices are $18/mo (Lite), $72/mo (Pro), and $160/mo (Max), with a 10% discount for quarterly billing. Prices were doubled in April 2026. GLM-5.1 consumes 3x quota during peak hours (14:00 to 18:00 UTC+8), and the promotional 1x off-peak multiplier expires at the end of June 2026. Plans are in "short-term sales restriction" with daily inventory limits released at 10:00 UTC+8.

Bottom line: GLM-5.1 at $1.40/$4.40 per MTok (input/output) is cheaper than Claude Opus 4.7 ($5/$25) and GPT-5.4 ($2.50/$10.00) while claiming competitive SWE-Bench Pro scores (58.4 vs 53.4 for Opus 4.6). The Coding Plan at $18-$160/mo offers strong value if the 15-30x API value claim holds, but the April price doubling and expiring promotional 1x off-peak multiplier mean costs will rise. The 8-hour continuous task capability is unmatched. Best suited for teams wanting multi-provider resilience or cost savings on high-volume agentic coding.

Key Terms

  • GLM Coding Plan - Zhipu AI's subscription service for using GLM models in coding agents. Supports Claude Code, Cline, OpenCode, Roo Code, Kilo Code, Cursor, Crush, and Goose. Uses a dedicated API endpoint separate from the general API. Source: Bigmodel – Overview
  • GLM-5.1 - Zhipu AI's latest flagship model (April 2026). Claims to align with Claude Opus 4.6 in coding ability. 200K context, 128K max output. Supports thinking mode, function calling, context caching, structured output, and MCP. Source: Bigmodel – Glm 5.1
  • Long-horizon tasks - GLM-5.1's headline capability: the model can work autonomously for up to 8 hours in a single task, performing planning, execution, testing, and iteration. Source: Bigmodel – Glm 5.1
  • Peak multiplier - GLM-5.1 and GLM-5-Turbo consume 3x quota during peak hours (14:00-18:00 UTC+8) and 2x during off-peak. As a promotional offer valid through end of June 2026, off-peak usage counts as 1x. Source: Bigmodel – Overview
  • MCP servers - GLM Coding Plan includes exclusive MCP servers for vision understanding, web search, web page reading, and open-source repository access. Source: Bigmodel – Overview

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

  • New model: GLM-5.1 launched April 8-9. 744B parameters (40B active), MoE. Claims alignment with Claude Opus 4.6.
  • Plan change: GLM Coding Plan prices doubled April 14. Lite: $18/mo, Pro: $72/mo, Max: $160/mo.
  • Plan change: Legacy subscription plans being phased out in favor of new Lite/Pro/Max tier structure (April 23).
  • Feature added: GLM-5.1 supports up to 8 hours of continuous autonomous work in a single task.
  • Feature added: Promotional 1x off-peak multiplier for GLM-5.1 through end of June 2026.
  • Plan change: Plans in "short-term sales restriction" with daily inventory limits released at 10:00 UTC+8.

Plans

GLM Coding Plan (Subscription)

Plan5-Hour LimitWeekly LimitMCP Calls/moRecommended ProjectsPrice (monthly)Price (quarterly, 10% off)
Lite~80 prompts~400 prompts1001 project$18/mo$16.20/mo ($48.60/quarter)
Pro~400 prompts~2,000 prompts1,0001-2 projects$72/mo$64.80/mo ($194.40/quarter)
Max~1,600 prompts~8,000 prompts4,0002+ projects$160/mo$144/mo ($432/quarter)

Each prompt triggers approximately 15-20 model calls. Monthly value is claimed to be 15-30x the subscription cost at API rates. Plans are currently in short-term sales restriction mode: limited inventory released daily at 10:00 UTC+8. Renewals and upgrades are not affected.

Available models: GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air. GLM-5.1 is recommended for complex tasks; GLM-4.7 for routine work to conserve quota.

Source: Bigmodel – Overview, Z – Subscribe

API (Pay-as-you-go)

The general API at open.bigmodel.cn is separate from the Coding Plan. Free models available: GLM-4.7-Flash, GLM-4.5-Flash (no token cost). All prices below are per 1M tokens.

Source: Z – Overview

z.ai Chat (Consumer)

Free web-based chatbot at z.ai powered by GLM-5.1 and GLM-5. No subscription required.

Source: Z

API Pricing

Text Models ($/MTok)

ModelInputCached InputCached Input StorageOutput
GLM-5.1$1.40$0.26Limited-time Free$4.40
GLM-5$1.00$0.20Limited-time Free$3.20
GLM-5-Turbo$1.20$0.24Limited-time Free$4.00
GLM-4.7$0.60$0.11Limited-time Free$2.20
GLM-4.7-FlashX$0.07$0.01Limited-time Free$0.40
GLM-4.6$0.60$0.11Limited-time Free$2.20
GLM-4.5$0.60$0.11Limited-time Free$2.20
GLM-4.5-X$2.20$0.45Limited-time Free$8.90
GLM-4.5-Air$0.20$0.03Limited-time Free$1.10
GLM-4.5-AirX$1.10$0.22Limited-time Free$4.50
GLM-4-32B-0414-128K$0.10--$0.10
GLM-4.7-FlashFreeFreeFreeFree
GLM-4.5-FlashFreeFreeFreeFree

Vision Models ($/MTok)

ModelInputCached InputOutput
GLM-5V-Turbo$1.20$0.24$4.00
GLM-4.6V$0.30$0.05$0.90
GLM-OCR$0.03-$0.03
GLM-4.6V-FlashX$0.04$0.004$0.40
GLM-4.5V$0.60$0.11$1.80
GLM-4.6V-FlashFreeFreeFree

Built-in Tools

ToolCost
Web Search$0.01/use

Image Generation (per image)

ModelPrice
GLM-Image$0.015
CogView-4$0.01

Video Generation (per video)

ModelPrice
CogVideoX-3$0.20
ViduQ1-Text$0.40
ViduQ1-Image$0.40

Source: Z – Overview

Model Performance / Benchmarks

BenchmarkGLM-5.1GPT-5.4Claude Opus 4.6Gemini 3.1 Pro
SWE-Bench Pro58.457.753.454.2

Additional GLM-5.1 capabilities:

  • 8-hour continuous autonomous work (building a complete Linux desktop, 655-round vector database optimization to 6.9x throughput)
  • KernelBench Level 3: 3.6x geometric mean speedup over torch.compile max-autotune
  • 200K context window, 128K max output

Source: Bigmodel – Glm 5.1

Latest News

GLM-5.1 Launch (April 8-9, 2026)

Zhipu AI released GLM-5.1, the latest flagship model with significant coding and long-horizon task improvements:

  • SWE-Bench Pro: 58.4 (claimed above GPT-5.4 at 57.7, Opus 4.6 at 53.4, Gemini 3.1 Pro at 54.2)
  • Claims alignment with Claude Opus 4.6 in comprehensive and coding capabilities
  • Long-horizon task capability: up to 8 hours of continuous autonomous work in a single task
  • 200K context window, 128K max output
  • Demonstrated building a complete Linux desktop system in 8 hours
  • Demonstrated 655-round iteration optimizing a vector database to 6.9x throughput
  • KernelBench Level 3: 3.6x geometric mean speedup over torch.compile max-autotune
  • 744B parameters (40B active), MoE architecture (carried from GLM-5)
  • Supports thinking mode, function calling, context caching, structured output, MCP
  • HN: 618 points, 263 comments

Source: Bigmodel – Glm 5.1, News – From

GLM Coding Plan Price Increase (April 14, 2026)

Zhipu AI doubled the GLM Coding Plan prices. HN: 18 points, 6 comments. This followed the GLM-5.1 launch and the promotional 1x multiplier for GLM-5.1 usage (off-peak) valid through end of June 2026.

Source: News – From

Legacy Plan Migration (April 23, 2026)

Zhipu AI began phasing out original subscription plans in favor of the new Lite/Pro/Max tier structure. Users on legacy plans are being migrated. HN: 4 points.

Source: News – From

GLM-5 Turbo for OpenClaw (March 2026)

GLM-5-Turbo released, optimized for the OpenClaw (persistent agent) scenario. Improved complex long-task execution continuity.

Source: Bigmodel – Model Overview

Community Signals

GLM-5 Launch (January 2026, ongoing relevance)

HN: 484 points, 520 comments (largest Zhipu thread). GLM-5 was positioned as an open-source SOTA model with coding capabilities aligned to Claude Opus 4.5. Community noted the 744B MoE architecture and open-source availability.

Source: News – From

GLM Coding Plan Adoption

  • GLM Coding Plan supports Claude Code, Cursor, Cline, OpenCode, Roo Code, Kilo Code, and other tools
  • Community documentation shows step-by-step guides for using GLM models with Claude Code via Anthropic-compatible proxy API
  • Zhipu provides a npx @z_ai/coding-helper tool for automatic configuration
  • HN: "GLM 4.5 with Claude Code" thread (213 points, 84 comments) showed early adoption interest

Source: News – From

Price Sensitivity

  • "Z.ai doubles it's coding plan prices" (HN, 18 points): community reacted negatively to the price increase following the GLM-5.1 launch
  • The promotional 1x multiplier for GLM-5.1 (off-peak, through June 2026) suggests Zhipu is trying to drive adoption of the new model while managing compute costs

Sales Restriction

The platform implemented daily inventory limits for new subscriptions due to "user volume surge exceeding expectations." This suggests either genuine demand or capacity constraints.

Source: Bigmodel – Overview

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML)NoNot mentioned. GLM Coding Plan uses API keys.
SSO (OIDC)NoNot mentioned.
SCIMNoNot mentioned.
Audit logsNoNot mentioned.
IP indemnityNoNot mentioned.
Data residencyNoNot mentioned. API endpoints are China-focused.
HIPAANoNot mentioned.
Air-gapped / on-premNoNot available.
SLANoNo published SLA.
Admin controls (RBAC)NoNo admin controls documented. Plans are single-user.

Transparency Gaps

GapDetailsSeverity
Prompt count is approximatePlan limits use "approximately X prompts" where each prompt triggers 15-20 model calls. Actual consumption depends on task complexity, making cost estimation unreliable.Medium
Peak multiplier creates uncertaintyGLM-5.1 consumes 3x quota during peak hours (14:00-18:00 UTC+8) and 2x off-peak. The promotional 1x rate (through June) will increase effective costs when it expires.Medium
Dynamic concurrency limitsConcurrency is "dynamically adjusted" based on resource availability. Max users get priority, but specific concurrency numbers are not published.Medium
Usage restrictions enforced by risk controlThe platform monitors for "improper use" including account sharing and use in non-approved tools. Violations trigger throttling, freezing, or banning. The detection methodology is not disclosed.Low
No batch pricing publishedBatch API is mentioned as available but pricing is not documented.Low
Model parameter count not disclosed for GLM-5.1GLM-5 is documented as 744B (40B active), but GLM-5.1's architecture is not specified.Low