Moonshot AI

Executive Summary

What it is: Moonshot AI's Kimi is a consumer AI platform with coding capabilities (Kimi Code), a CLI, IDE extensions, and web access. Consumer plans range from $0 (Adagio) to $199/mo (Vivace). The underlying K2.6 model is available via API at $0.95/$4.00 per MTok with a 256K context window, significantly cheaper than Western frontier models (Claude Opus 4.7 is $5.00/$25.00 per MTok). Kimi K2.5 is also available for free in Windsurf.

What to watch out for: Agent quotas are approximate ("based on typical task token consumption") with no concrete token counts, and Kimi Code multipliers ("1x", "5x", "15x") lack a defined base unit. The billing system has reported issues: double-charging, no invoice system, no visible cancel subscription link, and subscription state not syncing across devices. Rate limiting (429 errors) is frequent during peak usage, and all K2 series models will be discontinued on May 25, 2026.

Bottom line: Kimi K2.6 delivers competitive benchmark results (SWE-Bench Pro: 58.6%, Terminal-Bench 2.0: 66.7%) at API pricing well below Western frontier models, and its Agent Swarm architecture (300 sub-agents, 4,000 coordinated steps) is technically ambitious. However, billing reliability and customer service gaps make it risky for enterprise use. Best suited for individual developers comfortable with API-level integration who want frontier-competitive performance at a lower price point.

Key Terms

  • Unified credit pool - Kimi's consumer plans use a single credit balance metered by token consumption across all features (agent, Code, Swarm, Claw). Credits reset monthly. Source: Kimi – Membership Overview
  • Kimi Code - Moonshot AI's coding agent product, available as a CLI and IDE extension. Uses Kimi K2.6 as its underlying model. Source: Kimi – Code
  • Agent Swarm - Kimi's architecture for decomposing tasks into heterogeneous subtasks executed concurrently by self-created domain-specialized agents. K2.6 supports up to 300 sub-agents across 4,000 coordinated steps. Source: Kimi – Agent Swarm
  • Kimi Claw - A persistent, proactive AI agent that operates across multiple applications with 24/7 execution (similar to Anthropic's computer use). Available on Allegretto ($39/mo) and above. Source: Kimi – Kimi Claw Introduction
  • Context caching - Kimi automatically caches context. Cached tokens are billed at the "cache hit" rate, which is 83% cheaper than regular input for K2.6 ($0.16 vs $0.95 per MTok). Source: Kimi – Chat K26

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

  • New model: Kimi K2.6 launched April 19-20. SWE-Bench Pro: 58.6%, Terminal-Bench 2.0: 66.7%. API at $0.95/$4.00 per MTok.
  • Feature added: Agent Swarm expanded to 300 sub-agents across 4,000 coordinated steps (up from 100/1,500 in K2.5).
  • Deprecation: All Kimi K2 series models will be discontinued May 25, 2026. Users should migrate to K2.6.
  • Feature added: Cursor Composer 2 partnership confirmed. Kimi K2.5 is the base model for Cursor's Composer 2 with RL fine-tuning.
  • Feature added: K2.6 API available with OpenAI SDK compatibility. Top-up promotion: 20-30% bonus voucher on recharges of $100+ through May 3.

Plans

Consumer Plans (Kimi.com)

PlanPrice (monthly)Price (annual)Agent Quota/moKimi CodeAgent SwarmKey Inclusions
Adagio (Free)$0$06 agent tasksNot includedNot included1 concurrent agent task, 200 database calls
Moderato$19/mo$15/mo ($180/yr)60 agent tasks1x quota (undisclosed token count)Not included2 concurrent tasks, 4x speed priority, 2,000 database calls
Allegretto$39/mo$31/mo ($372/yr)150 agent tasks5x quota50 Swarm uses/mo, 4 concurrent subtasksKimi Claw, 2 concurrent tasks, 5,000 database calls
Allegro$99/mo$79/mo ($948/yr)360 agent tasks15x quota120 Swarm uses/mo, 4 concurrent subtasksKimi Claw, 4 concurrent tasks, 12,000 database calls
Vivace$199/mo$159/mo ($1,908/yr)720 agent tasks30x quota240 Swarm uses/mo, 8 concurrent subtasksKimi Claw, 4 concurrent tasks, 24,000 database calls

Agent quotas are approximate values based on typical task token consumption. Actual usage varies by task complexity. All plans share a unified credit pool metered by tokens.

Source: Kimi – Membership Pricing

API Plans (Pay-as-you-go)

TierCumulative RechargeConcurrencyRPMTPMTPD
Tier 0$113500,0001,500,000
Tier 1$10502002,000,000Unlimited
Tier 2$201005003,000,000Unlimited
Tier 3$1002005,0003,000,000Unlimited
Tier 4$1,0004005,0004,000,000Unlimited
Tier 5$3,0001,00010,0005,000,000Unlimited

Minimum recharge: $1. At $5 cumulative recharge, users receive a $5 voucher. Vouchers do not count toward cumulative recharge. Enterprise custom limits available via api-service@moonshot.ai.

Source: Kimi – Limits

API Pricing

ModelInput ($/MTok)Output ($/MTok)Cache Hit ($/MTok)Context Window
Kimi K2.6$0.95$4.00$0.16256K tokens
Kimi K2.5$0.60$3.00$0.10256K tokens
Kimi K2 (0905)$0.60$2.50$0.15256K tokens
Kimi K2 Turbo$1.15$8.00$0.15256K tokens
Kimi K2 Thinking$0.60$2.50$0.15256K tokens
Kimi K2 Thinking Turbo$1.15$8.00$0.15256K tokens
Kimi K2 (0711)$0.60$2.50$0.15128K tokens

Note: Kimi K2 series models will be discontinued on May 25, 2026 and will no longer be maintained. Users should migrate to Kimi K2.6.

Other API pricing:

  • Web search ($web_search tool): $0.005 per successful tool call
  • File upload/extract: temporarily free
  • K2.6 supports text, image, and video input
  • Thinking mode: can be enabled/disabled per request
  • Temperature fixed at 1.0 (thinking mode) or 0.6 (non-thinking), cannot be changed

Source: Kimi – Chat K26, Kimi – Chat K2, Kimi – Chat K25, Kimi – Tools

Model Performance / Benchmarks

BenchmarkKimi K2.6GPT-5.4Claude Opus 4.6
SWE-Bench Pro58.6%57.7%53.4%
Terminal-Bench 2.066.7%65.4%65.4%
HLE-Full (with tools)54.0%52.1%53.0%
SWE-Bench Verified80.2%-80.8%

Additional K2.6 capabilities:

  • Agent Swarm: up to 300 sub-agents across 4,000 coordinated steps
  • 256K context window, multimodal (text, image, video)
  • Demonstrated 13-hour continuous coding session optimizing exchange-core for 185% throughput improvement

Source: Kimi – Kimi K2 6

Latest News

Kimi K2.6 Launch (April 19-20, 2026)

Moonshot AI released Kimi K2.6, its latest open-source model with state-of-the-art coding and agent capabilities. Key claims:

  • SOTA on SWE-Bench Pro (58.6%), competitive with GPT-5.4 (57.7%) and above Claude Opus 4.6 (53.4%)
  • Terminal-Bench 2.0: 66.7%, above GPT-5.4 (65.4%) and Opus 4.6 (65.4%)
  • HLE-Full with tools: 54.0%, above GPT-5.4 (52.1%), Opus 4.6 (53.0%), Gemini 3.1 Pro (51.4%)
  • SWE-Bench Verified: 80.2%, competitive with Opus 4.6 (80.8%) and Gemini 3.1 Pro (80.6%)
  • Agent Swarm expanded to 300 sub-agents across 4,000 coordinated steps (up from 100/1,500 in K2.5)
  • 256K context window, multimodal (text, image, video), thinking and non-thinking modes
  • Demonstrated 13-hour continuous coding session optimizing exchange-core for 185% throughput improvement
  • Demonstrated 12-hour session implementing Qwen3.5-0.8B inference in Zig (a niche language)

Source: Kimi – Kimi K2 6

K2.6 API Available (April 19, 2026)

K2.6 model available on Kimi API platform at $0.95/$4.00 per MTok (input/output). OpenAI SDK compatible. Top-up promotion: 20-30% bonus voucher on recharges of $100+ during April 19 - May 3, 2026.

Source: Kimi – Kimi K2 6 Quickstart

K2 Series Discontinuation (May 25, 2026)

All Kimi K2 series models (kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, etc.) will be officially discontinued on May 25, 2026 and no longer maintained. Users should migrate to kimi-k2.6.

Source: Kimi – Chat K2

Cursor Composer 2 Partnership Confirmed (April 2026)

Community discovered Cursor's "in-house model" Composer 2 is Kimi K2.5 with RL fine-tuning, served via Fireworks AI. Moonshot confirmed it is an authorized commercial partnership. Cursor co-founder Lee Rob stated only ~1/4 of compute came from the base model; the rest is Cursor's own training. HN: 276 points, 168 comments.

Source: News – Item

Community Signals

Rate Limiting and Reliability Issues

Multiple reports of 429 errors and "system busy" messages on the Kimi Forum:

Billing and Subscription Issues

K2.6 Quality Feedback

Cursor/Kimi Licensing Discussion

  • HN discussion on Composer 2 being Kimi K2.5 drew 168 comments. Key debate: whether Cursor's use of open-weight models without prominent attribution violated Kimi's modified MIT license (requires displaying "Kimi K2.5" for products with >100M MAU or >$20M monthly revenue). Moonshot confirmed authorized partnership.
  • Quote: "There is so much money to be made repackaging open source these days." (HN user mohsen1)
  • Quote: "K2.6 offers SOTA-level performance at a fraction of the cost." (Kilo.ai CEO Scott Breitenother, via Kimi blog)

Source: News – Item

Enterprise Readiness

FeatureAvailable?Details
SSO (SAML)NoNot mentioned. Kimi is primarily a consumer and API product.
SSO (OIDC)NoNot mentioned.
SCIMNoNot mentioned.
Audit logsNoNot mentioned.
IP indemnityNoNot mentioned.
Data residencyNoNot mentioned. API endpoints are China-focused with limited global availability.
HIPAANoNot mentioned.
Air-gapped / on-premNoNot available.
SLANoNo published SLA.
Admin controls (RBAC)NoNo admin controls documented. API tiers are single-user.

Transparency Gaps

GapDetailsSeverity
Agent quota is approximatePlan inclusions use "approximate values based on typical task token consumption" rather than concrete token counts. A buyer cannot know exactly how much usage they get.High
Kimi Code quota multiplier unclearKimi Code is listed as "1x", "5x", "15x", "30x" without a concrete base unit. The actual token allocation for Kimi Code per plan is undisclosed.High
No invoice systemMultiple forum posts about inability to get invoices or billing transparency. No self-service invoice download.Medium
Subscription management gapsNo clear cancel subscription link. Subscription state does not sync across devices. Double-charging reported by multiple users.Medium
Refund policyNo refund policy published. Forum posts show users requesting refunds for immediate cancellations with no response.Medium
Rate limit changes during peakDocumentation states "when the cluster load reaches its capacity limit, we may take temporary measures to adjust the rate limits" without specifying what adjustments are made or when.Low
API batch pricingNo batch API or discounted async processing tier is documented.Low