AI Coding Agents Report: May 2026

Executive Summary

May 2026 will be remembered as the month the AI coding industry stopped pretending subscriptions could cover the cost of agentic workloads. Three major suppliers, GitHub Copilot, Anthropic, and OpenAI, are simultaneously transitioning from flat-rate plans to metered billing within a three-week window (May 31 through June 15). The community backlash has been severe: Copilot users report projected bills of $942 to $5,851 per month under the new AI Credits model, up from $39. Anthropic framed the removal of claude -p from subscription limits as a "new SDK credit" worth $20 to $200 per month, but the community calculated that $200 buys roughly four hours of Opus usage.

At the same time, Chinese model providers are closing the quality gap at a fraction of the cost. DeepSeek V4-Pro is now permanently priced at $0.435/$0.87 per MTok (11.5x cheaper than Opus 4.8 on input), and a NIST evaluation rated it "on par with GPT-5." Moonshot's K2.6 won an AI Coding Contest outright, beating GPT-5.5 and Claude Opus 4.7, at $0.95/$4.00 per MTok. The DeepClaude project, which routes Claude Code through DeepSeek's API, received 678 upvotes on HackerNews.

For technical leaders, the actionable takeaway is clear: the era of subsidized flat-rate AI coding is ending. Budget for API-level costs, evaluate whether frontier models are necessary for every task (Composer 2.5 and SWE-1.6 are "good enough" at a fraction of the price), and start planning for model routing strategies that send simple tasks to cheap models and complex ones to frontier models.

Cost-Effectiveness Analysis

The cheapest way to get frontier-quality coding assistance in May 2026 depends heavily on whether you need enterprise features and whether you are willing to use Chinese-hosted models.

For solo developers and small teams willing to manage their own API keys, DeepSeek V4-Pro at $0.435/$0.87 per MTok is the clear cost leader, roughly 11.5x cheaper than Claude Opus 4.8 ($5/$25) and 28.7x cheaper on output tokens. The DeepClaude project demonstrates that routing Claude Code through DeepSeek's API produces quality "close to Opus 4.5" at less than half of Anthropic's Haiku pricing. The trade-off is zero enterprise features (no SSO, no audit logs, no IP indemnity) and data sovereignty concerns under China's National Intelligence Law.

For teams that need enterprise compliance, the value equation is more nuanced. GitHub Copilot Enterprise at $39/user/month offers 21+ models from four providers, IP indemnity, SOC 2, HIPAA, and a published 99.9% SLA, but the June 1 AI Credits transition means that $39/month in included credits buys roughly 144 Opus requests, or about 4.8 per day. Google Gemini Code Assist Enterprise at $45 to $54/user/month is more expensive per seat but includes broader compliance (air-gapped deployment, data residency, ISO certifications) and Gemini 3.5 Flash at $1.50/$9.00 per MTok is significantly cheaper per token than Opus.

The emerging "best of both worlds" approach is model routing. Augment's Prism sends each turn to the cheapest model that can handle it, claiming 33% lower cost than Claude Code at matched quality on Opus 4.7 tasks. GitHub's auto model selection gives a 10% discount. Cursor's Composer 2.5 at $0.50/$2.50 per MTok handles 87% of agent-hours on the Ultra plan, reserving expensive frontier models for the 13% of tasks that truly need them.

For budget-conscious teams that want self-hosting, Mistral Medium 3.5 at $1.50/$7.50 per MTok (open weights, 128B dense, 77.6% SWE-Bench Verified) is the strongest open-weight option. The Mistral Vibe Pro plan at $14.99/month is the cheapest paid subscription that includes a full coding agent with CLI, VS Code extension, and remote agents.

What to Watch Next Month

June 2026 will see a cascade of pricing changes: Copilot AI Credits go live June 1, Anthropic SDK credits change June 15, Google shuts down the free Gemini CLI June 18, OpenAI Pro promos expire May 31, Alibaba's qwen3.7-max Token Plan promo ends June 22, and Zhipu's off-peak multiplier promo reverts at end of June. Anthropic has hinted at releasing Mythos-class models "in the coming weeks." Google has announced Gemini 3.5 Pro for June. Cognition's SWE-1.6 free promotional period ends around July 7, and post-promo pricing has not been announced. Teams should lock in any favorable promotional rates now and test model routing strategies before budgets are affected.

Per-Supplier Narrative

Anthropic (Claude Code)

Anthropic launched Claude Opus 4.8 on May 28 at the same $5/$25 per MTok pricing as Opus 4.7, with a fast mode at $10/$50 per MTok that is 3x cheaper than the previous fast mode pricing. The model defaults to "high" effort and introduces dynamic workflows (hundreds of parallel subagents) for Max, Team, and Enterprise plans. The community reception was overwhelmingly skeptical: the launch thread on Reddit (2,600 upvotes, 786 comments) had a consensus that Opus 4.7 was a downgrade from Opus 4.6 and 4.8 "builds on 4.7." Multiple users reported that Opus 4.6 disappeared from the desktop UI.

The bigger story is the June 15 SDK credit change, which removes claude -p and Agent SDK usage from subscription limits and bills against a monthly credit ($20 Pro, $100 Max 5x, $200 Max 20x) at API rates. The community called this a "Trojan Horse" and a "massive nerf disguised as a feature." Users estimate the $200 Max 20x credit exhausts in about four hours of normal Opus usage. Combined with the earlier test of removing Claude Code from the Pro plan (called a "test" after backlash), Anthropic's trust with individual subscribers is eroding.

On the enterprise side, Anthropic raised $65B at a $965B valuation (run-rate revenue $47B), acquired Stainless for SDK/MCP tooling, signed KPMG and PwC partnerships, and added Claude Security (beta) to the Enterprise plan. The aggressive pivot toward enterprise billing (API-rate consumption, SDK credits, Managed Agents) is unmistakable.

GitHub (Copilot)

The PRU-to-AI-Credits transition on June 1 is the most consequential pricing change in Copilot's history. Premium request units are replaced by token-based credits: Pro gets $10/month in credits, Pro+ gets $39, Business gets $19 (promo $30 through August), Enterprise gets $39 (promo $70 through August). Code completions remain free. Fallback to cheaper models when credits are exhausted is removed.

The community response has been devastating. Users posted screenshots of projected bills ranging from $942 to $5,851 per month under the new model. GPT-5.5 has a 57x legacy multiplier, meaning a Pro+ subscriber gets roughly 144 Opus-level requests per month. Multiple "farewell" threads with cancellation screenshots appeared on Reddit. One HackerNews commenter's company dropped Copilot Business for OpenAI Business with Codex before the change takes effect. Microsoft also reportedly canceled internal Claude Code licenses to push teams toward Copilot.

New features this month include the Copilot App (technical preview), auto model selection with a 10% discount, cloud agent fast models (Haiku 4.5 and GPT-5.4 mini at 0.33x), GPT-5.3-Codex as the first LTS base model for Business/Enterprise (guaranteed 12 months), and Claude Opus 4.8 support. Data collection for Free/Pro/Pro+ users changed from opt-in to opt-out on April 24. Copilot now serves 140,000 organizations (nearly 3x year-over-year).

Cursor

Cursor restructured its individual plans into three tiers: Pro ($20/month, $20 API usage included), Pro+ ($60/month, $70 included), and Ultra ($200/month, $400 included). The headline launch was Composer 2.5 on May 18, a proprietary model built on Moonshot's Kimi K2.5 checkpoint at $0.50/$2.50 per MTok (standard) or $3/$15 (fast). Composer 2.5 now handles 87% of agent-hours on the Ultra plan, with only 13% going to expensive frontier models.

A side-by-side comparison by Andrew Shu found Claude Code is roughly 5x cheaper overall and 38x cheaper on frontier models at the same $200/month price point. Cursor Ultra provides approximately 18 agent-hours of frontier model usage versus Claude Code Max 20x's 678 hours. Users also report that Cursor burns 7 to 9x more usage per prompt than VS Code with the same model, an unexplained overhead.

The SpaceX connection deepened: SpaceX announced an option to acquire Cursor for $60B, and Anysphere partnered with SpaceXAI to train a "significantly larger model from scratch" on Colossus 2 (million H100-equivalents). Cursor hit $300M ARR and is raising $2B+ at $50B+ valuation. A US Congressional probe into Anysphere's use of Chinese AI models adds regulatory uncertainty. New integrations include Jira, Microsoft Teams, shared canvases, multi-repo cloud agent environments, and auto-review run mode.

OpenAI (Codex)

OpenAI had a quiet month for pricing, with no API rate changes. The main impact comes from the May 31 expiration of Pro promotional multipliers: the $100/month plan reverts from 10x to 5x Plus usage (a 50% capacity cut), and the $200/month plan reverts from 25x to 20x. GPT-5.6 was spotted in the Codex UI by Reddit users but has not been confirmed by OpenAI.

API pricing remains GPT-5.5 at $5/$30 per MTok (standard), $2.50/$15 (batch), and $30/$180 (Pro priority). GPT-5.3-Codex at $1.75/$14 is the first LTS model on Copilot Business/Enterprise. Codex crossed 4 million weekly users. New features include Codex on the ChatGPT mobile app, HIPAA support for Codex, and a Dell partnership for hybrid/on-prem deployment.

The community's highlight was GPT-5.5 reportedly disproving a discrete geometry conjecture (the "Kissing Number Problem" in 5 dimensions), which generated 1,429 HackerNews points. The fine-tuning platform is winding down, with only o4-mini remaining accessible. OpenAI was named a Gartner Magic Quadrant Leader for Enterprise AI Coding Agents.

Windsurf

Cognition raised over $1B at a $26B valuation, with revenue run-rate at $492M. The company claims 89% of code at Cognition is now committed by Devin. Enterprise customers include Citi, Mercedes-Benz, Goldman Sachs, and the US Army. Devin Review (agentic code review) launched for all IDE users, and Devin for Terminal (a Rust CLI) entered preview.

The free tier was stripped of all third-party models, leaving only SWE-1.5 (proprietary). SWE-1.6 remains free during a promotional period ending around July 7, after which pricing has not been announced. The Pro trial restricts users to SWE-1.5 only, an undocumented limitation that drew backlash. Usage quotas are described only as "Light," "Standard," and "Heavy" with no concrete numbers, making cost comparison with competitors impossible.

Claude Opus 4.8 was added to supported models on launch day. Plans are Free ($0), Pro ($20/month), Max ($200/month), Teams ($40/user/month), and Enterprise (custom). Cognition is positioning Windsurf as the AI-native development platform rather than just a coding assistant.

Sourcegraph (Amp)

Amp rebuilt its CLI as "Neo" with auto-compaction, a Plugin API, and remote control. GPT-5.5 now powers deep, rush, and oracle modes. Rush 2.0 achieves a 44% task solve rate at an average cost of $0.58 per task and 1 minute 32 seconds per task. Amp Labs consulting service launched for enterprise customers.

Pricing remains pay-as-you-go with zero markup on Individual ($5 minimum credit), or 50% markup on Enterprise plus a $1,000 onboarding fee. The free tier ($10/day credit grant, roughly $300/month value) remains closed to new signups since February 2026. Per-token rates are not published on the website, making it difficult to compare with API-direct access. HackerNews engagement remains very low (11 points for the "Amp, Rebuilt" launch).

Augment Code

Augment launched Prism, a model routing system that sends each turn to the best-fit model across two clusters: Opus 4.7/Sonnet 4.6/Gemini Flash 3.0 or GPT-5.5/GPT-5.4/Kimi K2.6. Augment claims Prism delivers 33% lower cost than Claude Code at matched quality on Opus 4.7 tasks, though it adds ~2.6 seconds of median latency on the ~4% of turns where the planner activates.

Cosmos, an agent OS with sandboxed execution, entered public preview for Max plan ($200/dev/month) users only. A survey of 219 engineering leaders found 48% of code is now AI-generated. The r/AugmentCodeAI subreddit moved to restricted mode, and top posts are frustration and quit threads. Pricing is unchanged: Indie $20/month (40,000 credits), Standard $60/dev/month (130,000 credits), Max $200/dev/month (450,000 credits). The credit-to-token mapping is undisclosed.

Tabnine

Tabnine was named a Gartner Visionary in the 2026 Magic Quadrant for Enterprise AI Coding Agents. New features include Plan Mode in CLI, token cost APIs, per-team quota enforcement, and CLI Extensions. Upcoming v6.2 drops support for GPT-OSS, Gemma, and Qwen 3 in chat. Upcoming v6.3 (June) removes Inline Actions and adds Code Awareness.

Pricing is unchanged and enterprise-focused: Code Assistant at $39/user/month, Agentic Platform at $59/user/month, Headless Business at $1,200/month (5B tokens), and Headless Enterprise at $5,000/month (50B tokens). BYO LLM is available for unlimited usage. No individual developer plan exists. There was no significant community discussion of Tabnine in May.

Moonshot AI (Kimi Code)

Moonshot's K2.6 won the AI Coding Contest Word Gem Puzzle outright with 22 match points (7 wins, 1 draw, 0 losses), beating GPT-5.5 (16 points), Claude Opus 4.7 (12 points), and Gemini Pro 3.1 (9 points). At $0.95/$4.00 per MTok, K2.6 is the cheapest model with published competitive coding benchmarks. The K2.0/K2.1/K2.2/K2.3/K2.4 series was discontinued on May 25; only K2.5, K2.6, and V1 remain on the API.

New products include Kimi WebBridge (browser extension) and Berget Code for European teams with GDPR compliance. Billing and reliability issues from April persist: users report double-charging, no invoice system, and 429 rate limit errors. Enterprise readiness is near zero (no SSO, no SCIM, no audit logs, no SLA). Consumer plans range from $0 to $199/month.

Alibaba (Qwen Code)

Alibaba launched qwen3.7-max on May 21 as the new flagship, with 1M context, 64K max output, and 256K thinking budget, at approximately $1.67/$5.00 per MTok (USD equivalent). A Token Plan promotion halves qwen3.7-max credit consumption through June 22. Multiple third-party models were added to the Bailian platform including GLM-5.1, DeepSeek V4-Pro, and Xiaomi's mimo-v2.5-pro.

The Coding Plan Pro at ~$27/month offers 90,000 requests/month, but grants Alibaba a data-use license for training. The Token Plan (which does not use data for training) ranges from ~$27 to ~$193/seat/month. Qwen3.6-35B open-source release generated strong community interest (1,274 HackerNews points). Rate limits and qwen3.7-max benchmarks remain undisclosed.

Zhipu AI

GLM-5.1 scored 58.4 on SWE-Bench Pro, surpassing GPT-5.4 and Claude Opus 4.6, at approximately $0.83/$3.31 per MTok. This makes GLM-5.1 the cheapest frontier model among all tracked suppliers with published benchmarks. The Coding Plan Pro at ~$20.55/month offers 2,000 weekly prompts with 5-hour rolling windows, dramatically cheaper than Claude Pro or Cursor Pro at the same nominal price.

Peak-hour multipliers (3x at 14:00-18:00 UTC+8) are currently reduced to 1x during a promotional period ending in June. Purchase limits have been in place since January, signaling capacity strain. Zhipu is now listed on the Hong Kong Stock Exchange. Community presence in English-language channels is minimal.

DeepSeek

DeepSeek made its 75% V4-Pro discount permanent on May 22, locking in $0.435/$0.87 per MTok (down from $1.74/$3.48). V4-Flash is $0.14/$0.28 per MTok. A NIST CAISI evaluation rated V4-Pro "on par with GPT-5." Cache hit prices are $0.003625 (Pro) and $0.0028 (Flash) per MTok, roughly 1/10 of launch pricing. An Anthropic-compatible API endpoint was added.

The DeepClaude project, which routes Claude Code through DeepSeek's API, received 678 HackerNews points. Users report quality "close to Opus 4.5" at less than half of Haiku's price. The permanent discount thread generated 620 HackerNews points with 549 comments. Privacy concerns remain the dominant objection: there is no opt-out from training on API data, and China's National Intelligence Law applies. The deepseek-chat and deepseek-reasoner aliases retire on July 24.

Cerebras (Cerebras Code)

Cerebras IPO'd on NASDAQ (CBRS) at $185/share on May 14. The company runs open-weight models on custom wafer-scale chips at speeds up to 3,000 tokens/second. Both Cerebras Code subscription plans ($50/month Pro and $200/month Max) are sold out with no reopening timeline. The shared public API offers only two models: GPT OSS 120B at $0.35/$0.75 per MTok and GLM 4.7 at $2.25/$2.75 per MTok. Enterprise customers can access 30+ models on dedicated endpoints including Qwen3 235B, Kimi K2.6, and DeepSeek V3.2.

Kimi K2.6 inference reached 981 tokens/second on enterprise endpoints. Multi-LoRA entered private preview. Prompt caching pricing is undisclosed, which is a concern for agentic workflows that benefit heavily from caching. Speed is universally praised, but quality gaps versus Claude/Opus are noted.

xAI

xAI (now a SpaceX division, branded SpaceXAI) launched Grok Build 0.1 on May 29, a dedicated coding model at $1.00/$2.00 per MTok with 256K context. Grok Build CLI launched May 25 for SuperGrok and X Premium Plus subscribers. Grok 4.20 and 4.3 remain at $1.25/$2.50 per MTok with 1M context. Cached input is $0.20/MTok (84% discount).

Community reception is sparse and largely critical. Grok Build is seen as a late entrant with no published coding benchmarks. The knowledge cutoff is stuck at November 2024 (18+ months old). SuperGrok Heavy at a reported $300/month ($99 promotional) is considered expensive for beta quality. xAI announced a compute partnership with Anthropic (Colossus 1). Enterprise features include SSO, SCIM, and data residency but no published SLA or IP indemnity.

Google (Gemini Code Assist)

Google I/O 2026 brought Gemini 3.5 Flash at $1.50/$9.00 per MTok, which outperforms the more expensive Gemini 3.1 Pro ($2.00/$12.00) on most benchmarks. The announcement generated 962 HackerNews points. Antigravity 2.0 launched as Google's unified desktop agent, and Antigravity CLI replaces Gemini CLI for free and individual users (shutdown June 18). Gemini 3.5 Pro is announced for June. Google processes 3.2 quadrillion tokens per month.

Code Assist pricing is unchanged: Standard $19 to $22.80/user/month, Enterprise $45 to $54/user/month. Usage limits for all tiers remain undisclosed. One user reported 503 errors at a 70% rate during Flash model usage. Google has the broadest compliance coverage among all suppliers (IP indemnity, HIPAA, air-gapped, data residency, ISO certifications).

Mistral AI (Mistral Vibe)

Le Chat was rebranded to Vibe on May 28 as a unified agent platform. Mistral Medium 3.5 launched at $1.50/$7.50 per MTok (128B dense, open weights, 77.6% SWE-Bench Verified, 256K context). Remote coding agents, a VS Code extension, and Work mode all shipped. Devstral 2 was deprecated and retires July 31; the dedicated open-weight coding agent model is being replaced by the larger, more expensive Medium 3.5.

The Vibe rebrand generated 500 HackerNews points but 229 divided comments. Skeptics say the gap to frontier models is widening; supporters cite self-hosting, European data sovereignty, and the $14.99/month Pro plan (cheapest paid subscription with a full coding agent). The AI Now Summit announced a Les Ulis data center (Q3 2026) and partnerships with Airbus, BMW, and ASML. Emmi AI was acquired for physics simulation capabilities.

Meta

A quiet month for Meta's Llama program. Llama 4 Scout and Maverick remain the current models, now over 13 months old with no successor announced. No blog posts, no pricing changes, no new features. Llama API remains in waitlist mode with no public pricing. The community has settled Llama 4 into a "commodity" position: OpenRouter shows Maverick at 28.2B tokens/week and Scout at 10.4B tokens/week. Third-party pricing is $0.08/$0.30 (Scout) and $0.15/$0.60 (Maverick) per MTok via OpenRouter.

A notable transparency issue: Maverick's context window is listed as 10M tokens on the Llama homepage but 1M on the model card documentation. The Llama 4 Behemoth model has been mentioned in press for over 13 months without any release. Knowledge cutoff is August 2024, making Llama 4 increasingly stale for current-event-aware coding tasks.