Zhipu AI

Executive Summary

What it is: Zhipu AI (listed on HKEX as 02513.HK) is a Chinese LLM supplier founded by a Tsinghua University team. Its consumer product, z.ai, offers free chat powered by GLM-5.1 and GLM-5. The developer platform, bigmodel.cn, provides pay-per-token API access across a wide model lineup (text, vision, image generation, video, audio). The GLM Coding Plan is a subscription-based coding agent service compatible with Claude Code, OpenClaw, OpenCode, Kilo Code, Cline, TRAE, CodeBuddy, and 20+ other coding tools. Personal plans range from ¥49/month (Lite) to ¥469/month (Max). API pricing for the flagship GLM-5.1 is ¥6/¥24 per MTok (input/output at <32K context), making it one of the cheapest frontier-grade models available.

What to watch out for: The GLM Coding Plan operates on a quota system with 5-hour rolling windows and weekly caps, not flat monthly usage. GLM-5.1 and GLM-5-Turbo consume quota at 2x-3x multiplier during peak hours (14:00-18:00 UTC+8), though a temporary promotion reduces this to 1x off-peak through June 2026. The Coding Plan is restricted to approved coding tools only; using it outside these tools can result in account restrictions. Rate limits and concurrency are dynamic and not publicly disclosed in exact numbers. The platform has implemented purchase limits on Coding Plan subscriptions due to capacity constraints, which may still be in effect. Documentation is primarily in Chinese, which may be a barrier for non-Chinese-speaking teams.

Bottom line: Zhipu AI offers the most cost-effective frontier model API pricing among all tracked suppliers, with GLM-5.1 at roughly $0.83/$3.33 per MTok at current exchange rates, which is 6x cheaper than Claude Opus 4.8 on input and 7.5x cheaper on output. The GLM Coding Plan at ¥149/month (Pro) provides generous usage for agentic coding through industry-standard tools. The main risks are capacity constraints (purchase limits, dynamic rate limiting), China-centric documentation and support, and the model multiplier system that can inflate costs during peak hours.

Key Terms

  • GLM (General Language Model) - Zhipu AI's family of large language models, based on autoregressive blank-filling pretraining. The ChatGLM series supports complex natural language instructions and reasoning. Source: Bigmodel – Introduction
  • OpenClaw - Zhipu AI's branding for agentic coding workflows. The term "lobster" (龙虾) is used colloquially in Chinese documentation for coding agent tasks. GLM-5-Turbo is specifically optimized for OpenClaw scenarios. Source: Bigmodel – Glm 5 Turbo
  • GLM Coding Plan - A subscription service for AI-powered coding. Supports 20+ coding tools including Claude Code, OpenClaw, OpenCode, Kilo Code, Cline. Billed on 5-hour rolling windows and weekly quotas, not flat monthly tokens. Source: Bigmodel – Overview
  • Token-based billing - API usage is charged per million tokens. GLM series models use approximately 1 token per 1.6 Chinese characters. Pricing tiers based on input context length (<32K tokens vs 32K+ tokens). Source: Bigmodel – Introduction
  • Prompt caching - Context caching is available for GLM models. Cache storage is currently free (limited-time promotion). Cache hits are billed at reduced rates (e.g., ¥1.3/MTok for GLM-5.1 vs ¥6/MTok full input). Source: Zhipu AI – Pricing
  • Batch API - Asynchronous processing at 50% discount. Supported on older GLM-4 series models. Source: Zhipu AI – Pricing
  • Context window - Maximum tokens the model processes in one conversation. GLM-5.1 and GLM-5 support 200K context with up to 128K max output. GLM-4-Long supports up to 1M context. Source: Bigmodel – Model Overview
  • Thinking mode - Chain-of-thought reasoning enabled via thinking: { type: "enabled" }. Supported on GLM-5.1, GLM-5, GLM-5-Turbo. Temperature defaults to 1.0 when thinking is enabled. Source: Bigmodel – Thinking Mode
  • MCP servers - Model Context Protocol servers provided with the Coding Plan, including vision understanding, web search, web page reading, and open source repository reading. Source: Bigmodel – Overview
  • Model multiplier - GLM-5.1 and GLM-5-Turbo consume Coding Plan quota at 3x during peak hours (14:00-18:00 UTC+8) and 2x off-peak. A promotion running through June 2026 reduces off-peak to 1x. Source: Bigmodel – Overview

Latest Changes

Changes since the 2026-04 report.

  • New model: GLM-5.1 launched April 7, 2026 as the latest flagship. Coding capability aligned with Claude Opus 4.6. SWE-Bench Pro score of 58.4, surpassing GPT-5.4 and Claude Opus 4.6. Supports 8-hour long-horizon autonomous tasks. 200K context, 128K max output. Source: Bigmodel – New Releases
  • New model: GLM-5V-Turbo launched April 2, 2026. Multimodal coding model combining vision understanding with coding capability. 200K context, 128K max output. Source: Bigmodel – New Releases
  • New model: GLM-5-Turbo launched March 15, 2026. OpenClaw-optimized base model with enhanced tool calling, instruction following, and long-duration task execution. 200K context, 128K max output. Source: Bigmodel – New Releases
  • Feature added: GLM Coding Plan now supports GLM-5.1 across all tiers (Lite, Pro, Max). Source: Bigmodel – Overview
  • Promotion: GLM-5.1 and GLM-5-Turbo consume Coding Plan quota at 1x (instead of 2x) during off-peak hours, running through end of June 2026. Source: Bigmodel – Overview
  • Feature added: GLM in Excel (Beta) included in Coding Plan subscriptions. Source: Bigmodel – Overview
  • Milestone: Zhipu AI published its first financial results report as a publicly listed company (HKEX: 02513.HK) on March 31, 2026. Source: Zhipuai – News
  • Deprecation: GLM-Z1 series scheduled for deprecation on November 15, 2025 (already deprecated). GLM-4-0520 scheduled for December 30, 2025 deprecation. Source: Bigmodel – Model Overview

Plans

GLM Coding Plan (Personal)

PlanMonthly PriceQuarterly Price (per month)5-Hour Quota (prompts)Weekly Quota (prompts)MCP Calls/MonthRecommended Projects
Lite¥49¥44.1 (9% off)~80~4001001 small repo
Pro¥149¥134.1 (9% off)~400~2,0001,0001-2 mid-size repos
Max¥469¥422.1 (9% off)~1,600~8,0004,0002+ large repos

Annual subscriptions available at 20% discount.

What's included in all plans:

  • Models: GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air
  • MCP tools: Vision understanding, web search, web page reading, open source repo reading
  • Compatible tools: Claude Code, OpenClaw, OpenCode, Kilo Code, Cline, TRAE, CodeBuddy, and 20+ others
  • GLM in Excel (Beta)

Model multiplier for Coding Plan:

  • GLM-5.1 / GLM-5-Turbo: 3x during peak (14:00-18:00 UTC+8), 2x off-peak (promotional: 1x off-peak through June 2026)
  • GLM-4.7 / GLM-4.5-Air: 1x at all times

Monthly value estimate: Each plan provides API-equivalent value of 15-30x the monthly subscription price (accounting for weekly quota limits).

Purchase limits: The platform has implemented daily purchase limits on Coding Plan subscriptions due to demand exceeding capacity. Limits are released daily at 10:00 UTC+8. Existing subscribers renewing or upgrading are not affected. Source: Bigmodel – Overview

Free Tier (z.ai)

The z.ai consumer chatbot provides free access to GLM-5.1 and GLM-5 via web interface with features including AutoClaw (agent mode), AI Slides, Magic Design, Full-Stack coding, and Write Code. No API access included.

Free API Models

ModelContextInput PriceOutput Price
GLM-4.7-Flash200KFreeFree
GLM-4.6V-Flash128KFreeFree
GLM-4.1V-Thinking-Flash64KFreeFree
GLM-4V-Flash16KFreeFree
CogView-3-Flash-FreeFree
CogVideoX-Flash-FreeFree

Source: Zhipu AI – Pricing

API Pricing

Flagship Models (per 1M tokens, in CNY)

ModelContext TierInput (¥/MTok)Output (¥/MTok)Cache Storage (¥/MTok/hr)Cache Hit (¥/MTok)
GLM-5.1<32K¥6¥24Free (limited-time)¥1.3
GLM-5.132K+¥8¥28Free (limited-time)¥2
GLM-5-Turbo<32K¥5¥22Free (limited-time)¥1.2
GLM-5-Turbo32K+¥7¥26Free (limited-time)¥1.8
GLM-5<32K¥4¥18Free (limited-time)¥1
GLM-532K+¥6¥22Free (limited-time)¥1.5

Mid-Tier and Economy Models (per 1M tokens, in CNY)

ModelContext TierInput (¥/MTok)Output (¥/MTok)Cache Hit (¥/MTok)
GLM-4.7<32K, output <0.2K¥2¥8¥0.4
GLM-4.7<32K, output 0.2K+¥3¥14¥0.6
GLM-4.732-200K¥4¥16¥0.8
GLM-4.5-Air<32K, output <0.2K¥0.8¥2¥0.16
GLM-4.5-Air<32K, output 0.2K+¥0.8¥6¥0.16
GLM-4.5-Air32-128K¥1.2¥8¥0.24
GLM-4.7-FlashX200K¥0.5¥3¥0.1
GLM-4.7-Flash200KFreeFreeFree

USD Approximate Pricing (at ~¥7.25/USD)

ModelInput ($/MTok)Output ($/MTok)
GLM-5.1 (<32K)~$0.83~$3.31
GLM-5.1 (32K+)~$1.10~$3.86
GLM-5-Turbo (<32K)~$0.69~$3.03
GLM-5 (<32K)~$0.55~$2.48
GLM-4.7 (<32K)~$0.28-$0.41~$1.10-$1.93
GLM-4.5-Air~$0.11~$0.28-$0.83
GLM-4.7-FlashX~$0.07~$0.41

Batch API Pricing

50% discount on standard pricing for supported models (GLM-4 series). Source: Zhipu AI – Pricing

Private Instance Pricing

ModelDeploymentPrice
GLM-4.6200K fp8¥175/GPU unit/day
GLM-4.5128K fp8¥175/GPU unit/day
GLM-4.5-Air128K fp8¥100/GPU unit/day

Source: Zhipu AI – Pricing

Search Tools

ToolPrice
Search-Std (Zhipu self-developed)¥0.01/request
Search-Pro (Zhipu enhanced)¥0.03/request
Search-Pro-Sogou¥0.05/request
Search-Pro-Quark¥0.05/request

Source: Zhipu AI – Pricing

Model Performance / Benchmarks

ModelBenchmarkScoreNotes
GLM-5.1SWE-Bench Pro58.4Surpassed GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro
GLM-5SWE-Bench Verified77.8Open-source SOTA
GLM-5Terminal Bench 2.056.2Open-source SOTA
GLM-5BrowseCompUndisclosedOpen-source SOTA (web browsing/retrieval)
GLM-5MCP-AtlasUndisclosedOpen-source SOTA (tool calling, multi-step tasks)
GLM-5tau2-BenchUndisclosedOpen-source SOTA (complex multi-tool planning/execution)
GLM-5ZClawBenchUndisclosedSignificantly above GLM-5 (OpenClaw agent benchmark)
GLM-5-TurboZClawBenchUndisclosedAbove GLM-5 and multiple mainstream models (OpenClaw benchmark)

Key context: GLM-5 claims coding performance aligned with Claude Opus 4.5 (not the latest 4.6/4.8). GLM-5.1 claims alignment with Claude Opus 4.6. Zhipu publishes benchmark scores selectively (exact numbers for some, "open-source SOTA" claims without numbers for others). The GLM-5 series uses a 744B parameter MoE architecture with 40B active parameters, up from 355B/32B in the previous generation. GLM-5 integrates DeepSeek Sparse Attention for long-context efficiency.

Sources:

Latest News

GLM-5.1 Flagship Launch (April 7, 2026): Zhipu AI released GLM-5.1 as its latest flagship model. Key claims: coding capability aligned with Claude Opus 4.6, 8-hour long-horizon autonomous task execution (planning, execution, testing, delivery in a single session), SWE-Bench Pro score of 58.4 (claimed global best). The model achieved this through multi-turn SFT, RL, and process quality evaluation. Practical demonstrations include building a Linux desktop from scratch in 8 hours, 655-round iterative optimization achieving 6.9x throughput improvement on a vector database, and 3.6x geometric mean speedup on KernelBench Level 3 (vs torch.compile's 1.49x). Source: Bigmodel – New Releases

GLM-5V-Turbo Multimodal Coding Model (April 2, 2026): A new vision+coding model with 200K context and 128K output. Enhanced GUI Agent and Coding Agent performance for "see environment, plan actions, execute tasks" workflows. Adds visual tools: bounding box, screenshot, web page reading with image recognition. Source: Bigmodel – New Releases

GLM-5-Turbo OpenClaw Optimization (March 15, 2026): Purpose-built model for coding agent workflows. Enhanced tool calling, instruction following, time-aware task execution, and high-throughput long-chain processing. Released alongside ZClawBench, a new end-to-end agent benchmark for OpenClaw scenarios. Skills usage in OpenClaw grew from 26% to 45%. Source: Bigmodel – Glm 5 Turbo

Zhipu AI First Financial Report (March 31, 2026): As a publicly listed company (HKEX: 02513.HK), Zhipu published its first annual results. Specific financial figures not available in the fetched data. Source: Zhipuai – News

GLM-5 Launch (February 12, 2026): The GLM-5 base model launched with 744B parameters (40B active), 28.5T training tokens, and DeepSeek Sparse Attention integration. Positioned for "Agentic Engineering" with coding capability aligned to Claude Opus 4.5. Source: Bigmodel – New Releases

GLM Coding Plan Demand Surge (Ongoing): Zhipu implemented daily purchase limits on Coding Plan subscriptions starting January 23 due to demand exceeding capacity. New inventory released daily at 10:00 UTC+8. Existing subscribers are unaffected. Source: Bigmodel – Overview

Community Signals

Coding tool ecosystem endorsements: Multiple coding tool vendors provided public endorsements for the GLM Coding Plan. Kilo Code highlighted the combination of generous quotas and low cost as removing cost anxiety. Cline praised the pricing and quota structure as "hard to beat" for developers seeking high-value AI coding. Crush (which used GLM for key architecture) and Factory both endorsed GLM-5's performance-to-cost ratio. Source: Zhipu AI – Glm Coding

Coding Plan purchase limits signal capacity strain: The platform has been rationing Coding Plan subscriptions since January 2026, releasing limited stock daily. This indicates demand significantly outpacing GPU capacity, which is a risk for users who need predictable access. The rationing notice states the team is "working at highest priority to coordinate resources." Source: Bigmodel – Overview

Peak-hour multiplier controversy: The 3x multiplier for GLM-5.1 during peak hours (14:00-18:00 UTC+8) effectively reduces the value of Coding Plan subscriptions during the most common working hours for Chinese developers. The temporary 1x off-peak promotion through June 2026 is clearly an acquisition strategy. Users should expect the full multiplier to apply after the promotion ends, which would significantly change the value equation for GLM-5.1 usage. Source: Bigmodel – Overview

No significant English-language community presence: Reddit (r/LocalLLaMA) and Hacker News show minimal discussion of Zhipu AI's GLM models in May 2026. The primary community engagement is through Chinese-language channels (Feishu/Lark groups, Chinese social media). This limits the availability of independent quality assessments and user reports for non-Chinese-speaking evaluators.

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML/OIDC)UndisclosedNot mentioned in public documentation
SCIMUndisclosedNot mentioned in public documentation
Audit logsUndisclosedNot mentioned in public documentation
IP indemnityUndisclosedNot mentioned in public documentation. A commercial license agreement exists for model use. Source: Bigmodel – Model Commercial Use
Data residencyPartialCloud private instances and on-premise deployment available. On-prem pricing listed as "数千万" (tens of millions) for GLM-4-0520. Source: Zhipu AI – Pricing
HIPAANoNot mentioned
Air-gapped / On-premYesLocal private deployment available for GLM-4 series with hardware appliance. Pricing starts from "数十万" (hundreds of thousands CNY) for smaller models. Source: Zhipu AI – Pricing
SLAUndisclosedNot mentioned in public documentation
Admin controls (RBAC)PartialTeam plan available with central billing. Source: Bigmodel – Team
Content security / moderationYesBuilt-in content safety audit for text, image, audio, video. Source: Bigmodel – Securityaudit
Model fine-tuningYesLoRA and full fine-tuning supported on GLM-4.5, GLM-4.5-Air, GLM-4 series. Source: Zhipu AI – Pricing
OpenAI API compatibilityYesSupports OpenAI SDK, Claude API compatibility, LangChain, HTTP API, Python SDK, Java SDK. Source: Bigmodel – Introduction

Team Plan: GLM Coding Plan offers team tiers. Specific pricing and per-seat structure available in the team documentation. Source: Bigmodel – Team

Cloud private instances: Available at ¥100-175/GPU unit/day for dedicated model deployments. Annual套餐 available (e.g., GLM-4.5 at ¥1.1M/year, GLM-4.5-Air at ¥500K/year). Source: Zhipu AI – Pricing

Transparency Gaps

  1. Exact rate limits undisclosed. The platform uses dynamic rate limiting based on user tier, subscription level, and current load. No specific RPM/TPM numbers are published. Users must check the console for their current limits. Source: Bigmodel – Rate Limit
  1. Coding Plan quota inexact. Quotas are described as "approximately X prompts" with the caveat that actual usage varies by "project complexity, codebase size, and whether auto-accept is enabled." There is no token-level accounting exposed to users. Source: Bigmodel – Overview
  1. Peak-hour multiplier end date. The 1x off-peak promotion for GLM-5.1/GLM-5-Turbo runs "through end of June" but the full multiplier (2x off-peak, 3x peak) is the stated permanent rate. Users cannot plan long-term costs without knowing whether the promotion will be extended. Source: Bigmodel – Overview
  1. Purchase limit duration. The daily rationing of Coding Plan subscriptions has no stated end date. The notice says "short-term" but has been in effect since January 2026 (5+ months). Source: Bigmodel – Overview
  1. Benchmark scores partially disclosed. GLM-5 claims "open-source SOTA" on BrowseComp, MCP-Atlas, and tau2-Bench without publishing exact numbers. GLM-5.1's SWE-Bench Pro 58.4 is the only fully quantified coding benchmark. Source: Bigmodel – Glm 5
  1. Model architecture details. GLM-5/5.1 is described as 744B total / 40B active parameters, but training data composition, architecture specifics beyond "DeepSeek Sparse Attention," and inference optimization details are not fully disclosed.
  1. Enterprise features. SSO, SCIM, audit logs, IP indemnity, and SLA details are not documented publicly. Enterprises must contact sales for this information. Source: Zhipu AI – Pricing
  1. Concurrent request limits. Coding Plan documentation recommends project counts (Lite: 1, Pro: 1-2, Max: 2+) but does not state actual concurrent request limits. The documentation acknowledges users sometimes "feel like only 1 concurrent request" during peak hours. Source: Bigmodel – Rate Limit
  1. Cache storage pricing. Currently listed as "free (limited-time)" for all flagship models. The permanent pricing is undisclosed. Source: Zhipu AI – Pricing
  1. GLM-4.7 output-tier pricing. GLM-4.7 has a unique pricing structure where output cost depends on output length (<0.2K or 0.2K+ tokens). This is unusual and not clearly explained in documentation. Source: Zhipu AI – Pricing