Key Terms
- Unified credit pool - Kimi's consumer plans use a single credit balance metered by token consumption across all features (agent, Code, Swarm, Claw). Credits reset monthly. Source: Kimi – Membership Credits
- Kimi Code - Moonshot AI's coding agent product, available as a CLI and IDE extension. Uses Kimi K2.6 as its underlying model (branded as
kimi-for-coding). Source: Kimi – Code - Agent Swarm - Kimi's architecture for decomposing tasks into heterogeneous subtasks executed concurrently by self-created domain-specialized agents. K2.6 supports up to 300 sub-agents across 4,000 coordinated steps. Source: Kimi – Agent Swarm
- Kimi Claw - A persistent, proactive AI agent that operates across multiple applications with 24/7 execution. Available on Allegretto ($39/mo) and above. Source: Kimi – Kimi Claw Introduction
- Context caching - Kimi automatically caches context. Cached tokens are billed at the "cache hit" rate, which is 83% cheaper than regular input for K2.6 ($0.16 vs $0.95 per MTok). Source: Kimi – Chat K26
- Kimi WebBridge - A browser extension for AI agents, released in May 2026. Source: Kimi – Webbridge
Latest Changes
Changes since the 2026-04 report.
- Deprecation: All Kimi K2 series models (kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo) were officially discontinued on May 25, 2026. Only K2.5, K2.6, and Moonshot V1 remain available via API. See API Pricing.
- Feature added: Kimi Code now defaults to K2.6 (branded as
kimi-for-coding). The CLI shows "Model: kimi-for-coding (powered by kimi-k2.6)". Source: Kimi – Code - Feature added: Kimi WebBridge released, a browser extension enabling AI agents to interact with web content. Source: Kimi – Webbridge
- Partnership: Berget AI announced Berget Code for European teams, powered by Kimi K2.6, targeting GDPR-compliant deployment. Source: Berget – Berget Code Launch En
- Community win: Kimi K2.6 won the AI Coding Contest Word Gem Puzzle outright with 22 match points (7-1-0 record), beating GPT-5.5 (16 pts), Claude Opus 4.7 (12 pts), and Gemini Pro 3.1 (9 pts). Source: Thinkpol – An Open Weights Chinese Model Just Beat Claude Gpt 5 5 And Gemini In A Programming Challenge
- No pricing changes: API and consumer plan pricing remain unchanged from April.
Plans
Consumer Plans (Kimi.com)
| Plan | Price (monthly) | Price (annual) | Agent Quota/mo | Kimi Code | Agent Swarm | Key Inclusions |
|---|---|---|---|---|---|---|
| Adagio (Free) | $0 | $0 | 6 agent tasks | Not included | Not included | 1 concurrent agent task, 200 database calls |
| Moderato | $19/mo | $15/mo ($180/yr) | 60 agent tasks | 1x quota (undisclosed token count) | Not included | 2 concurrent tasks, 4x speed priority, 2,000 database calls |
| Allegretto | $39/mo | $31/mo ($372/yr) | 150 agent tasks | 5x quota | 50 Swarm uses/mo, 4 concurrent subtasks | Kimi Claw, 2 concurrent tasks, 5,000 database calls |
| Allegro | $99/mo | $79/mo ($948/yr) | 360 agent tasks | 15x quota | 120 Swarm uses/mo, 4 concurrent subtasks | Kimi Claw, 4 concurrent tasks, 12,000 database calls |
| Vivace | $199/mo | $159/mo ($1,908/yr) | 720 agent tasks | 30x quota | 240 Swarm uses/mo, 8 concurrent subtasks | Kimi Claw, 4 concurrent tasks, 24,000 database calls |
Agent quotas are approximate values based on typical task token consumption. Actual usage varies by task complexity. All plans share a unified credit pool metered by tokens. Annual billing saves up to $480/year vs monthly.
Source: Kimi – Pricing
API Plans (Pay-as-you-go)
| Tier | Cumulative Recharge | Concurrency | RPM | TPM | TPD |
|---|---|---|---|---|---|
| Tier 0 | $1 | 1 | 3 | 500,000 | 1,500,000 |
| Tier 1 | $10 | 50 | 200 | 2,000,000 | Unlimited |
| Tier 2 | $20 | 100 | 500 | 3,000,000 | Unlimited |
| Tier 3 | $100 | 200 | 5,000 | 3,000,000 | Unlimited |
| Tier 4 | $1,000 | 400 | 5,000 | 4,000,000 | Unlimited |
| Tier 5 | $3,000 | 1,000 | 10,000 | 5,000,000 | Unlimited |
Minimum recharge: $1. At $5 cumulative recharge, users receive a $5 voucher. Vouchers do not count toward cumulative recharge. Enterprise custom limits available via api-service@moonshot.ai.
Source: Kimi – Limits
API Pricing
| Model | Input ($/MTok) | Output ($/MTok) | Cache Hit ($/MTok) | Context Window |
|---|---|---|---|---|
| Kimi K2.6 | $0.95 | $4.00 | $0.16 | 256K tokens |
| Kimi K2.5 | $0.60 | $3.00 | $0.10 | 256K tokens |
| Moonshot V1 8K | $0.20 | $2.00 | N/A | 8K tokens |
| Moonshot V1 32K | $1.00 | $3.00 | N/A | 32K tokens |
| Moonshot V1 128K | $2.00 | $5.00 | N/A | 128K tokens |
Discontinued models (removed May 25, 2026): Kimi K2 (0905), K2 Turbo, K2 Thinking, K2 Thinking Turbo, K2 (0711).
Other API pricing:
- Web search ($web_search tool): $0.005 per successful tool call
- File upload/extract: temporarily free
- K2.6 and K2.5 support text, image, and video input
- Thinking mode: can be enabled/disabled per request (
"thinking": {"type": "enabled"}) - Temperature fixed at 1.0 (thinking mode) or 0.6 (non-thinking), cannot be changed
- Top-p fixed at 0.95, cannot be changed
Source: Kimi – Chat K26, Kimi – Chat K25, Kimi – Chat V1, Kimi – Tools
Model Performance / Benchmarks
Kimi K2.6 Benchmarks
| Benchmark | Kimi K2.6 | GPT-5.4 (xhigh) | Claude Opus 4.6 (max) | Gemini 3.1 Pro (thinking high) | Kimi K2.5 |
|---|---|---|---|---|---|
| SWE-Bench Pro | 58.6% | 57.7% | 53.4% | 54.2% | 50.7% |
| Terminal-Bench 2.0 | 66.7% | 65.4% | 65.4% | 68.5% | 50.8% |
| SWE-Bench Verified | 80.2% | - | 80.8% | 80.6% | 76.8% |
| SWE-Bench Multilingual | 76.7% | - | 77.8% | 76.9% | 73.0% |
| HLE-Full (with tools) | 54.0% | 52.1% | 53.0% | 51.4% | 50.2% |
| BrowseComp | 83.2% | 82.7% | 83.7% | 85.9% | 74.9% |
| OSWorld-Verified | 73.1% | 75.0% | 72.7% | - | 63.3% |
| AIME 2026 | 96.4% | 99.2% | 96.7% | 98.3% | 95.8% |
| LiveCodeBench (v6) | 89.6% | - | 88.8% | 91.7% | 85.0% |
Additional K2.6 capabilities:
- Agent Swarm: up to 300 sub-agents across 4,000 coordinated steps
- 256K context window, multimodal (text, image, video)
- Demonstrated 13-hour continuous coding session optimizing exchange-core for 185% throughput improvement
- Demonstrated 12-hour session implementing Qwen3.5-0.8B inference in Zig
Source: Kimi – Kimi K2 6
Latest News
K2 Series Discontinued (May 25, 2026)
All Kimi K2 series models (kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo) were officially discontinued on May 25, 2026. Users must migrate to kimi-k2.6 or kimi-k2.5.
Source: Kimi – Chat
Kimi K2.6 Wins AI Coding Contest (April 30, reported widely May)
In the ongoing AI Coding Contest Word Gem Puzzle challenge, Kimi K2.6 won outright with 22 match points (7-1-0 record), beating GPT-5.5 (16 pts), Claude Opus 4.7 (12 pts), and Gemini Pro 3.1 (9 pts). The challenge tested real-time decision-making and clean functional code connecting to a TCP server. The HN post drew 219 comments and 380 points.
Source: Thinkpol – An Open Weights Chinese Model Just Beat Claude Gpt 5 5 And Gemini In A Programming Challenge News – Item
Kimi WebBridge Released (May 26, 2026)
Moonshot AI released Kimi WebBridge, a browser extension enabling AI agents to interact with web content directly. This extends Kimi's agent capabilities into the browser.
Source: Kimi – Webbridge, News – Item
Berget Code for European Teams (May 13, 2026)
Berget AI announced Berget Code, a coding agent for European teams powered by Kimi K2.6, targeting GDPR-compliant European hosting.
Source: Berget – Berget Code Launch En, News – Item
DeepSeek V4 Pro vs Kimi K2.6 Comparison (May 15, 2026)
Kilo.ai published a comparison of DeepSeek V4 Pro and Flash vs Claude Opus 4.7 and Kimi K2.6, evaluating coding agent performance.
Source: Kilo – We Tested Deepseek V4 Pro And Flash, News – Item
Kimi K2.6 HN Launch Discussion (April 20, ongoing into May)
The original K2.6 launch HN thread reached 710 points and 372 comments, making it one of the most-discussed model launches in May. Key community themes: competitive benchmark performance at low cost, questions about Chinese model licensing, and excitement about the open-weight availability.
Source: News – Item
Community Signals
K2.6 Competitive Performance Recognition
The K2.6 launch received significant positive community attention across HN, with 710 points on the launch thread and 380 points on the AI Coding Contest win thread. Key themes:
- Open-source models are closing the gap with Western frontier models
- K2.6's SWE-Bench Pro score (58.6%) beating GPT-5.4 (57.7%) drew particular attention
- Cost-performance ratio frequently highlighted: $0.95/$4.00 per MTok vs $5/$25 for Claude Opus
Quote: "Kimi K2.6 sets a new level for open-sourced models, especially in long-horizon, agent-style coding workflows." (Blackbox.ai CEO Robert Rizk, via K2.6 blog)
Quote from AI Coding Contest: "When models within a few index points of the frontier are also freely available to run locally, that's a different competitive situation than the one that existed a year ago." (Rohana Rezel, contest organizer)
Source: News – Item News – Item Thinkpol – An Open Weights Chinese Model Just Beat Claude Gpt 5 5 And Gemini In A Programming Challenge
Ecosystem Adoption
Multiple third-party products building on Kimi K2.6:
- Kilo Code: uses K2.6 as a core model. "K2.6 offers SOTA-level performance at a fraction of the cost." (Kilo CEO)
- Tencent CodeBuddy: integrates K2.6 Thinking for complex programming tasks
- Berget Code: European GDPR-compliant coding agent powered by K2.6
- Kimiflare: open-source Claude Code clone powered by K2.6 on Cloudflare Workers AI (12k+ npm downloads)
- Ollama: K2.6 available for local deployment
- Cursor: Composer 2 built on K2.5 (confirmed partnership in April)
Source: Kimi – Kimi K2 6, News – Item
Rate Limiting and Reliability Issues (Ongoing)
Reports of 429 errors and "system busy" messages continue on the Kimi Forum:
- "Kimi CLI stuck in engine overloaded loop for 48h" (April 28). Source: Forum – Kimi Cli Stuck In Engine Overloaded Loop For 48H
- "Error code 429: We're receiving too many requests at the moment" (12 replies, 841 views, ongoing). Source: Forum – Error Code 429 Were Receiving Too Many Requests At The Moment
Billing and Subscription Issues (Ongoing from April)
Billing issues reported in April appear unresolved:
- Double-charging after cancellation reported by multiple users
- No self-service invoice download
- No visible cancel subscription link
- Subscription state not syncing across devices
Source: Forum – 353, Forum – No Link To Cancel Subscription
Kimi K2.6 Self-Hosting and Optimization
Florian Leibert published "5.6x throughput on Kimi K2.6 by speculating less" on HuggingFace, demonstrating how to optimize K2.6 inference on MI300X hardware. 11 points on HN.
Source: Hugging Face – Kimi K26 Dflash Mi300X, News – Item
Enterprise Readiness
| Feature | Available? | Details |
|---|---|---|
| SSO (SAML) | No | Not mentioned. Kimi is primarily a consumer and API product. |
| SSO (OIDC) | No | Not mentioned. |
| SCIM | No | Not mentioned. |
| Audit logs | No | Not mentioned. |
| IP indemnity | No | Not mentioned. |
| Data residency | Partial | Berget AI offers European-hosted K2.6 for GDPR compliance. No official data residency from Moonshot directly. |
| HIPAA | No | Not mentioned. |
| Air-gapped / on-prem | Partial | K2.6 is open-weight and available via Ollama for local deployment, but no official on-prem enterprise product. |
| SLA | No | No published SLA. |
| Admin controls (RBAC) | No | No admin controls documented. API tiers are single-user. |
Transparency Gaps
| Gap | Details | Severity |
|---|---|---|
| Agent quota is approximate | Plan inclusions use "approximate values based on typical task token consumption" rather than concrete token counts. A buyer cannot know exactly how much usage they get. | High |
| Kimi Code quota multiplier unclear | Kimi Code is listed as "1x", "5x", "15x", "30x" without a concrete base unit. The actual token allocation for Kimi Code per plan is undisclosed. | High |
| No invoice system | Multiple forum posts about inability to get invoices or billing transparency. No self-service invoice download. | Medium |
| Subscription management gaps | No clear cancel subscription link. Subscription state does not sync across devices. Double-charging reported by multiple users. | Medium |
| Refund policy | No refund policy published. Forum posts show users requesting refunds for immediate cancellations with no response. | Medium |
| Rate limit changes during peak | Documentation states "when the cluster load reaches its capacity limit, we may take temporary measures to adjust the rate limits" without specifying what adjustments are made or when. | Low |
| No batch API | No batch API or discounted async processing tier is documented. | Low |
| No thinking token pricing | K2.6 thinking mode generates reasoning tokens, but there is no separate pricing for thinking tokens vs regular output tokens. It is unclear whether thinking tokens are billed at output rates. | Low |