Moonshot AI

Executive Summary

What it is: Moonshot AI's Kimi is a consumer AI platform with coding capabilities (Kimi Code), available via CLI, IDE extensions (VS Code, Zed, JetBrains), and web. Consumer plans range from $0 (Adagio) to $199/mo (Vivace annual). The underlying K2.6 model is available via API at $0.95/$4.00 per MTok (input/output) with a 256K context window, roughly 5x cheaper than Claude Opus 4.8 ($5/$25 per MTok) for competitive coding quality. Kimi Code now defaults to K2.6 for all plans.

What to watch out for: Agent quotas remain approximate with no concrete token counts, and Kimi Code multipliers ("1x", "5x", "15x", "30x") still lack a defined base unit. K2 series models (kimi-k2-0905, kimi-k2-thinking, etc.) were discontinued on May 25, 2026. Billing issues from April persist: double-charging reports, no invoice system, no visible cancel subscription link. Rate limiting (429 errors) continues during peak usage.

Bottom line: Kimi K2.6 delivers frontier-competitive coding benchmarks (SWE-Bench Pro: 58.6%, Terminal-Bench 2.0: 66.7%) at API pricing well below Western models. The K2.6 model won the AI Coding Contest Word Gem Puzzle challenge outright, beating GPT-5.5, Claude Opus 4.7, and Gemini Pro 3.1. However, billing reliability, customer service gaps, and zero enterprise features (no SSO, no audit logs, no SLA) make it risky for teams. Best suited for individual developers comfortable with API-level integration who want frontier-competitive performance at a fraction of the cost.

Key Terms

  • Unified credit pool - Kimi's consumer plans use a single credit balance metered by token consumption across all features (agent, Code, Swarm, Claw). Credits reset monthly. Source: Kimi – Membership Credits
  • Kimi Code - Moonshot AI's coding agent product, available as a CLI and IDE extension. Uses Kimi K2.6 as its underlying model (branded as kimi-for-coding). Source: Kimi – Code
  • Agent Swarm - Kimi's architecture for decomposing tasks into heterogeneous subtasks executed concurrently by self-created domain-specialized agents. K2.6 supports up to 300 sub-agents across 4,000 coordinated steps. Source: Kimi – Agent Swarm
  • Kimi Claw - A persistent, proactive AI agent that operates across multiple applications with 24/7 execution. Available on Allegretto ($39/mo) and above. Source: Kimi – Kimi Claw Introduction
  • Context caching - Kimi automatically caches context. Cached tokens are billed at the "cache hit" rate, which is 83% cheaper than regular input for K2.6 ($0.16 vs $0.95 per MTok). Source: Kimi – Chat K26
  • Kimi WebBridge - A browser extension for AI agents, released in May 2026. Source: Kimi – Webbridge

Latest Changes

Changes since the 2026-04 report.

  • Deprecation: All Kimi K2 series models (kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo) were officially discontinued on May 25, 2026. Only K2.5, K2.6, and Moonshot V1 remain available via API. See API Pricing.
  • Feature added: Kimi Code now defaults to K2.6 (branded as kimi-for-coding). The CLI shows "Model: kimi-for-coding (powered by kimi-k2.6)". Source: Kimi – Code
  • Feature added: Kimi WebBridge released, a browser extension enabling AI agents to interact with web content. Source: Kimi – Webbridge
  • Partnership: Berget AI announced Berget Code for European teams, powered by Kimi K2.6, targeting GDPR-compliant deployment. Source: Berget – Berget Code Launch En
  • Community win: Kimi K2.6 won the AI Coding Contest Word Gem Puzzle outright with 22 match points (7-1-0 record), beating GPT-5.5 (16 pts), Claude Opus 4.7 (12 pts), and Gemini Pro 3.1 (9 pts). Source: Thinkpol – An Open Weights Chinese Model Just Beat Claude Gpt 5 5 And Gemini In A Programming Challenge
  • No pricing changes: API and consumer plan pricing remain unchanged from April.

Plans

Consumer Plans (Kimi.com)

PlanPrice (monthly)Price (annual)Agent Quota/moKimi CodeAgent SwarmKey Inclusions
Adagio (Free)$0$06 agent tasksNot includedNot included1 concurrent agent task, 200 database calls
Moderato$19/mo$15/mo ($180/yr)60 agent tasks1x quota (undisclosed token count)Not included2 concurrent tasks, 4x speed priority, 2,000 database calls
Allegretto$39/mo$31/mo ($372/yr)150 agent tasks5x quota50 Swarm uses/mo, 4 concurrent subtasksKimi Claw, 2 concurrent tasks, 5,000 database calls
Allegro$99/mo$79/mo ($948/yr)360 agent tasks15x quota120 Swarm uses/mo, 4 concurrent subtasksKimi Claw, 4 concurrent tasks, 12,000 database calls
Vivace$199/mo$159/mo ($1,908/yr)720 agent tasks30x quota240 Swarm uses/mo, 8 concurrent subtasksKimi Claw, 4 concurrent tasks, 24,000 database calls

Agent quotas are approximate values based on typical task token consumption. Actual usage varies by task complexity. All plans share a unified credit pool metered by tokens. Annual billing saves up to $480/year vs monthly.

Source: Kimi – Pricing

API Plans (Pay-as-you-go)

TierCumulative RechargeConcurrencyRPMTPMTPD
Tier 0$113500,0001,500,000
Tier 1$10502002,000,000Unlimited
Tier 2$201005003,000,000Unlimited
Tier 3$1002005,0003,000,000Unlimited
Tier 4$1,0004005,0004,000,000Unlimited
Tier 5$3,0001,00010,0005,000,000Unlimited

Minimum recharge: $1. At $5 cumulative recharge, users receive a $5 voucher. Vouchers do not count toward cumulative recharge. Enterprise custom limits available via api-service@moonshot.ai.

Source: Kimi – Limits

API Pricing

ModelInput ($/MTok)Output ($/MTok)Cache Hit ($/MTok)Context Window
Kimi K2.6$0.95$4.00$0.16256K tokens
Kimi K2.5$0.60$3.00$0.10256K tokens
Moonshot V1 8K$0.20$2.00N/A8K tokens
Moonshot V1 32K$1.00$3.00N/A32K tokens
Moonshot V1 128K$2.00$5.00N/A128K tokens

Discontinued models (removed May 25, 2026): Kimi K2 (0905), K2 Turbo, K2 Thinking, K2 Thinking Turbo, K2 (0711).

Other API pricing:

  • Web search ($web_search tool): $0.005 per successful tool call
  • File upload/extract: temporarily free
  • K2.6 and K2.5 support text, image, and video input
  • Thinking mode: can be enabled/disabled per request ("thinking": {"type": "enabled"})
  • Temperature fixed at 1.0 (thinking mode) or 0.6 (non-thinking), cannot be changed
  • Top-p fixed at 0.95, cannot be changed

Source: Kimi – Chat K26, Kimi – Chat K25, Kimi – Chat V1, Kimi – Tools

Model Performance / Benchmarks

Kimi K2.6 Benchmarks

BenchmarkKimi K2.6GPT-5.4 (xhigh)Claude Opus 4.6 (max)Gemini 3.1 Pro (thinking high)Kimi K2.5
SWE-Bench Pro58.6%57.7%53.4%54.2%50.7%
Terminal-Bench 2.066.7%65.4%65.4%68.5%50.8%
SWE-Bench Verified80.2%-80.8%80.6%76.8%
SWE-Bench Multilingual76.7%-77.8%76.9%73.0%
HLE-Full (with tools)54.0%52.1%53.0%51.4%50.2%
BrowseComp83.2%82.7%83.7%85.9%74.9%
OSWorld-Verified73.1%75.0%72.7%-63.3%
AIME 202696.4%99.2%96.7%98.3%95.8%
LiveCodeBench (v6)89.6%-88.8%91.7%85.0%

Additional K2.6 capabilities:

  • Agent Swarm: up to 300 sub-agents across 4,000 coordinated steps
  • 256K context window, multimodal (text, image, video)
  • Demonstrated 13-hour continuous coding session optimizing exchange-core for 185% throughput improvement
  • Demonstrated 12-hour session implementing Qwen3.5-0.8B inference in Zig

Source: Kimi – Kimi K2 6

Latest News

K2 Series Discontinued (May 25, 2026)

All Kimi K2 series models (kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo) were officially discontinued on May 25, 2026. Users must migrate to kimi-k2.6 or kimi-k2.5.

Source: Kimi – Chat

Kimi K2.6 Wins AI Coding Contest (April 30, reported widely May)

In the ongoing AI Coding Contest Word Gem Puzzle challenge, Kimi K2.6 won outright with 22 match points (7-1-0 record), beating GPT-5.5 (16 pts), Claude Opus 4.7 (12 pts), and Gemini Pro 3.1 (9 pts). The challenge tested real-time decision-making and clean functional code connecting to a TCP server. The HN post drew 219 comments and 380 points.

Source: Thinkpol – An Open Weights Chinese Model Just Beat Claude Gpt 5 5 And Gemini In A Programming Challenge News – Item

Kimi WebBridge Released (May 26, 2026)

Moonshot AI released Kimi WebBridge, a browser extension enabling AI agents to interact with web content directly. This extends Kimi's agent capabilities into the browser.

Source: Kimi – Webbridge, News – Item

Berget Code for European Teams (May 13, 2026)

Berget AI announced Berget Code, a coding agent for European teams powered by Kimi K2.6, targeting GDPR-compliant European hosting.

Source: Berget – Berget Code Launch En, News – Item

DeepSeek V4 Pro vs Kimi K2.6 Comparison (May 15, 2026)

Kilo.ai published a comparison of DeepSeek V4 Pro and Flash vs Claude Opus 4.7 and Kimi K2.6, evaluating coding agent performance.

Source: Kilo – We Tested Deepseek V4 Pro And Flash, News – Item

Kimi K2.6 HN Launch Discussion (April 20, ongoing into May)

The original K2.6 launch HN thread reached 710 points and 372 comments, making it one of the most-discussed model launches in May. Key community themes: competitive benchmark performance at low cost, questions about Chinese model licensing, and excitement about the open-weight availability.

Source: News – Item

Community Signals

K2.6 Competitive Performance Recognition

The K2.6 launch received significant positive community attention across HN, with 710 points on the launch thread and 380 points on the AI Coding Contest win thread. Key themes:

  • Open-source models are closing the gap with Western frontier models
  • K2.6's SWE-Bench Pro score (58.6%) beating GPT-5.4 (57.7%) drew particular attention
  • Cost-performance ratio frequently highlighted: $0.95/$4.00 per MTok vs $5/$25 for Claude Opus

Quote: "Kimi K2.6 sets a new level for open-sourced models, especially in long-horizon, agent-style coding workflows." (Blackbox.ai CEO Robert Rizk, via K2.6 blog)

Quote from AI Coding Contest: "When models within a few index points of the frontier are also freely available to run locally, that's a different competitive situation than the one that existed a year ago." (Rohana Rezel, contest organizer)

Source: News – Item News – Item Thinkpol – An Open Weights Chinese Model Just Beat Claude Gpt 5 5 And Gemini In A Programming Challenge

Ecosystem Adoption

Multiple third-party products building on Kimi K2.6:

  • Kilo Code: uses K2.6 as a core model. "K2.6 offers SOTA-level performance at a fraction of the cost." (Kilo CEO)
  • Tencent CodeBuddy: integrates K2.6 Thinking for complex programming tasks
  • Berget Code: European GDPR-compliant coding agent powered by K2.6
  • Kimiflare: open-source Claude Code clone powered by K2.6 on Cloudflare Workers AI (12k+ npm downloads)
  • Ollama: K2.6 available for local deployment
  • Cursor: Composer 2 built on K2.5 (confirmed partnership in April)

Source: Kimi – Kimi K2 6, News – Item

Rate Limiting and Reliability Issues (Ongoing)

Reports of 429 errors and "system busy" messages continue on the Kimi Forum:

Billing and Subscription Issues (Ongoing from April)

Billing issues reported in April appear unresolved:

  • Double-charging after cancellation reported by multiple users
  • No self-service invoice download
  • No visible cancel subscription link
  • Subscription state not syncing across devices

Source: Forum – 353, Forum – No Link To Cancel Subscription

Kimi K2.6 Self-Hosting and Optimization

Florian Leibert published "5.6x throughput on Kimi K2.6 by speculating less" on HuggingFace, demonstrating how to optimize K2.6 inference on MI300X hardware. 11 points on HN.

Source: Hugging Face – Kimi K26 Dflash Mi300X, News – Item

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML)NoNot mentioned. Kimi is primarily a consumer and API product.
SSO (OIDC)NoNot mentioned.
SCIMNoNot mentioned.
Audit logsNoNot mentioned.
IP indemnityNoNot mentioned.
Data residencyPartialBerget AI offers European-hosted K2.6 for GDPR compliance. No official data residency from Moonshot directly.
HIPAANoNot mentioned.
Air-gapped / on-premPartialK2.6 is open-weight and available via Ollama for local deployment, but no official on-prem enterprise product.
SLANoNo published SLA.
Admin controls (RBAC)NoNo admin controls documented. API tiers are single-user.

Transparency Gaps

GapDetailsSeverity
Agent quota is approximatePlan inclusions use "approximate values based on typical task token consumption" rather than concrete token counts. A buyer cannot know exactly how much usage they get.High
Kimi Code quota multiplier unclearKimi Code is listed as "1x", "5x", "15x", "30x" without a concrete base unit. The actual token allocation for Kimi Code per plan is undisclosed.High
No invoice systemMultiple forum posts about inability to get invoices or billing transparency. No self-service invoice download.Medium
Subscription management gapsNo clear cancel subscription link. Subscription state does not sync across devices. Double-charging reported by multiple users.Medium
Refund policyNo refund policy published. Forum posts show users requesting refunds for immediate cancellations with no response.Medium
Rate limit changes during peakDocumentation states "when the cluster load reaches its capacity limit, we may take temporary measures to adjust the rate limits" without specifying what adjustments are made or when.Low
No batch APINo batch API or discounted async processing tier is documented.Low
No thinking token pricingK2.6 thinking mode generates reasoning tokens, but there is no separate pricing for thinking tokens vs regular output tokens. It is unclear whether thinking tokens are billed at output rates.Low