Executive Summary
April 2026 is the month subsidized AI coding ended. GitHub Copilot announced a transition to token-based billing on June 1, replacing its flat-rate Premium Request Unit system with GitHub AI Credits at published API rates. The community calculates this as a 600%+ price increase for many workflows. The move validates Anthropic's pricing direction and signals that the era of all-you-can-eat AI coding subscriptions is closing across the industry.
Three new frontier models arrived. Claude Opus 4.7 launched at $5/$25 per MTok (same price as Opus 4.6) but with a tokenizer that increases token counts 1.0 to 1.35x, raising effective cost. GPT-5.5 took the Terminal-Bench 2.0 lead at 82.7%, priced at $5/$30 per MTok. DeepSeek V4-Flash arrived at $0.14/$0.28 per MTok, the cheapest frontier-capable model by a wide margin, with a 1M context window and an Anthropic-compatible endpoint that works directly with Claude Code.
Quality and trust questions dominated the conversation. An AMD executive published a forensic analysis showing Claude Code's pre-edit research effort collapsed by 90% over six weeks after Anthropic silently changed the default effort level from "high" to "medium." OpenAI argued SWE-bench Verified is saturated at 93.9% and no longer measures frontier capability. A visible community migration from Cursor to Claude Code accelerated, driven by cost. One user reported dropping from $1,800/month on Cursor to $200/month after switching.
For buyers evaluating right now: DeepSeek V4-Flash paired with Claude Code is the cheapest path to frontier-quality agentic coding. Google Gemini Code Assist's free tier (6,000 code requests/day) is the lowest-risk starting point. Tabnine's Context Engine at $59/user/month is the strongest enterprise option for codebase-aware completions at scale.
Cost-Effectiveness Analysis
The cheapest frontier-quality agentic coding setup in April 2026 is DeepSeek V4-Flash paired with Claude Code via the Anthropic-compatible endpoint. At $0.14/$0.28 per MTok (DeepSeek), V4-Flash is roughly 36x cheaper than Claude Opus 4.7 ($5/$25 per MTok Anthropic) and 18x cheaper than GPT-5.4 ($2.50/$15 per MTok OpenAI). A developer using Claude Code with DeepSeek as the backend gets the same agent UX at a fraction of the cost, trading model quality and reliability for savings.
The next tier up in cost includes several newly priced options. Llama 4 Scout at $0.25/$0.70 per MTok via Google Vertex AI (Meta) is the cheapest model with a published LiveCodeBench score (32.8) and a 10M token context window. Zhipu GLM-5.1 at $1.40/$4.40 per MTok (Zhipu AI) undercuts both Opus 4.7 and GPT-5.4 while posting SWE-Bench Pro at 58.4%. xAI Grok 4.20 at $2.00/$6.00 per MTok (xAI) is priced between GPT-5.4 and Opus 4.7, with the largest context window in this report at 2M tokens. Google Gemini 3.1 Pro at $2.00/$12.00 per MTok and Gemini 3 Flash at $0.50/$3.00 per MTok (Google) offer competitive benchmarks (80.6% SWE-Bench Verified for 3.1 Pro) with the backing of Vertex AI infrastructure. Cerebras Code Pro at $50/month for 24M tokens/day (Cerebras) implies roughly $0.07/MTok if fully utilized, but the product remains sold out.
For solo developers who want a managed product, Google Gemini Code Assist's free tier (6,000 code requests/day Google) is unmatched. No other supplier offers a free tier with published daily limits this high. Zhipu's GLM Coding Plan Lite at $18/month (Zhipu AI) is the cheapest paid coding agent plan with Anthropic-compatible proxy support. Mistral Pro at $14.99/month (Mistral AI) is the cheapest paid tier that includes a coding agent CLI, though its opaque usage limits make it hard to compare directly.
For small teams (5 to 20 developers), Sourcegraph Amp's zero-markup pass-through pricing (Sourcegraph) means the team pays exactly the underlying API cost with no middleman surcharge. Tabnine's Agentic Platform at $59/user/month (Tabnine) with unlimited BYO LLM usage and a Context Engine that improves acceptance rates from 28% to 41% per community reports offers strong value when codebase-aware completions matter more than raw frontier model access. Teams can also use Vertex AI partner pricing to mix and match: Llama 4 Scout at $0.25/$0.70 per MTok, or use Mistral's direct API for Codestral at $0.30/$0.90, Devstral 2 at $0.40/$2.00, or Mistral Small 4 at $0.15/$0.60 per MTok (Mistral AI and Google) without managing separate API accounts.
For enterprises (100+ developers), Tabnine and Augment Code are the two suppliers with Context Engine technology that indexes repositories for persistent codebase understanding. Tabnine's community reports from 220-developer and 300-developer deployments (Tabnine) provide more validated data than Augment's abstract credit system (Augment Code). GitHub Copilot Enterprise at $39/user/month (GitHub) offers the widest model selection and deepest IDE integration, but the June 1 billing transition adds uncertainty.
DeepSeek V4-Flash, Llama 4 Scout, and Zhipu GLM-5.1 form a cost tier below $1.50/MTok input that makes extended agentic sessions economically viable. DeepSeek at $0.14/$0.28 and Kimi K2.6 at $0.95/$4.00 per MTok (Moonshot AI) offer frontier-competitive benchmarks at 5x to 40x lower cost than Western frontier models. Llama 4 Scout at $0.25/$0.70 via Vertex AI is the cheapest frontier model with a managed API. Zhipu GLM-5.1 at $1.40/$4.40 adds 8-hour continuous task execution and a $18/month Coding Plan. The tradeoff for all three: DeepSeek has no SLAs, Llama requires a third-party API or self-hosting, and Zhipu has capacity-constrained sales and expiring promotional pricing.
What to Watch Next Month
Five pricing transitions hit in late May and early June. DeepSeek V4-Pro promotional pricing expires May 31, quadrupling from $0.435/$0.87 to $1.74/$3.48 per MTok. OpenAI's Pro plan promotional multipliers (10x for $100/month, 25x for $200/month) also expire May 31, reverting to 5x and 20x. GitHub Copilot's token-based billing takes effect June 1, the single largest structural pricing change in the AI coding market this year. Kimi K2 series models are discontinued May 25, requiring migration to K2.6. Cerebras deprecates Llama 3.1 8B and Qwen 3 235B Instruct on May 27.
Beyond pricing, Zhipu's GLM-5.1 promotional off-peak multiplier (1x, vs the standard 2x) expires at the end of June, which will increase effective costs for GLM Coding Plan users. Google's Gemini Cloud Assist features are free during preview but will likely be priced when the preview ends, affecting Enterprise tier value. Watch for Anthropic's response to the sustained quality degradation criticism: any change to effort defaults or usage limit transparency would affect every downstream supplier that depends on Claude models (Cursor, Copilot, Windsurf, Augment, Amp).
Per-Supplier Narrative
Anthropic (Claude Code)
Claude Opus 4.7 launched April 16 at $5/$25 per MTok with state-of-the-art results on SWE-bench Verified, Terminal-Bench 2.0, and CursorBench (70% vs Opus 4.6 at 58%). Two factors increase effective cost despite unchanged list pricing: a new tokenizer that maps the same input to 1.0 to 1.35x more tokens, and a new "xhigh" effort level whose token multiplier is undisclosed. The dominant community story was a forensic analysis by AMD's Stella Laurenzo showing files-read-to-edits ratio dropped from 21.8 to 1.6 over six weeks, a 90% collapse in pre-edit research. Anthropic acknowledged the default effort level was silently shifted from "high" to "medium" on March 3. AMD's entire team migrated to a different provider. Anthropic does not publish exact usage limits for any plan tier. Even the Max 20x plan ($200/month) describes usage only as "20x Pro usage" with no concrete number.
Risks: silent default changes, no published usage limits, tokenizer change effectively increases cost by up to 35%. The r/ClaudeLimits subreddit (tracking usage limits and bugs) exists because Anthropic will not publish these numbers.
GitHub (Copilot)
The biggest structural change this month. On April 27, GitHub announced all Copilot plans transition from per-request (PRU) billing to token-based billing (GitHub AI Credits) on June 1. Pro ($10/month) includes $10 in AI Credits, Pro+ ($39/month) includes $39. Promotional credits for Business ($30/month, normally $19) and Enterprise ($70/month, normally $39) run June through August only. Opus models were removed from the $10/month Pro plan on April 20, requiring Pro+ ($39/month) for Opus access. Self-serve Business signups and Pro trials are both paused. Copilot code review will also consume GitHub Actions minutes starting June 1. The community consensus on HN (747 points, 549 comments): "This is the best AI programming will be. From here on the enshitification starts and the prices go up." One enterprise user reported: "At my company basically everyone and their mother are using Claude Code via Bedrock, despite us having company-wide Windsurf, Copilot and ChatGPT Enterprise accounts."
Risks: the 7.5x Opus 4.7 multiplier is described as "introductory," promotional credits expire in three months, and the status page methodology was quietly changed to inflate uptime numbers.
Cursor
The dominant community trend this month is developers switching from Cursor to Claude Code. A Reddit thread titled "Aaaaand I cancelled my Cursor subscription" reached 179 upvotes. One HN user reported dropping from $1,800/month on Cursor to $200/month after switching. Community analysis revealed that Cursor's agent runs Anthropic's Claude Code SDK behind a local HTTP proxy, meaning Cursor's differentiation is UX and context management, not the underlying model. SpaceX secured an option to acquire Cursor for $60 billion (with a $10 billion fallback), triggering privacy concerns about source code access. Composer 2, Cursor's "in-house model," was revealed to be Kimi K2.5 with RL fine-tuning, confirmed by co-founder Lee Rob after community pressure. Hidden prompt cache costs caused billing shocks: one user reported 21M cache read tokens on a single call with roughly 4k user input tokens. The $40/user/month Teams plan only includes $20 of AI usage credits, with the remaining $20 allocated to "team features" not disclosed at purchase.
Risks: no published usage limits, hidden cache token costs, SpaceX acquisition raises privacy questions, active user migration to alternatives.
OpenAI (Codex)
GPT-5.5 launched April 23, leading Terminal-Bench 2.0 at 82.7% (vs Claude Opus 4.7 at 69.4%). API pricing is $5/$30 per MTok standard, $2.50/$15.00 for batch and flex, $12.50/$75.00 for priority. OpenAI claims GPT-5.5 uses "significantly fewer tokens" than GPT-5.4 for equivalent tasks, partially offsetting the 2x per-token price increase. The Business Codex tier operates on pay-as-you-go per-token billing with no fixed seat fee, a flexible model for teams that want API-level pricing without subscriptions. The April 28 AWS partnership puts OpenAI models on Amazon Bedrock and Codex on AWS, breaking Azure exclusivity after a restructured Microsoft partnership. Usage limits are stated as wide ranges (e.g., "15 to 80" GPT-5.5 messages per 5 hours for Plus), making actual capacity unpredictable. The Pro plan's 10x multiplier is promotional and expires May 31, reverting to 5x.
Risks: promotional pricing on Pro plans expires May 31, usage ranges make budgeting unreliable, GPT-5.5 Pro cached input pricing is undisclosed, SWE-bench Pro lacks independent verification.
Windsurf
Windsurf's quota-based billing system, introduced in March, dominated community discussion this month. A user analysis showed the Pro plan ($20/month) provides only $19.92 worth of token value at list prices. Multiple reports confirm the weekly quota can be exhausted in a single day of moderate use, and two Claude Opus 4.6 requests consume a full daily Pro quota. "Goodbye Windsurf" (97 upvotes) and "Windsurf is no longer a viable product" (40 upvotes) represent a wave of departure posts. On the positive side, SWE-1.6 (Windsurf's proprietary model at $0.30/$1.50 per MTok, free with 0 credits) received strong community reception (27 upvotes, 64 comments). Kimi K2.5 is also free. Windsurf 2.0 launched April 15 with an Agent Command Center and Devin Cloud integration, though users worry it will further strain quotas. Quota token counts for the "Light," "Standard," and "Heavy" tiers remain unpublished.
Risks: quotas insufficient for sustained use with frontier models, no published token counts for any tier, daily/weekly quota ratio is opaque, extra usage billed at API rates with no pre-request cost estimate.
Sourcegraph (Amp)
Amp charges the exact API cost from Anthropic, OpenAI, and Google with zero markup on individual and team plans, making it the most cost-efficient managed coding agent for developers willing to pay per-token. Opus 4.7 replaced the previous smart mode model on April 25, with Amp's internal eval score rising from roughly 65% to 72%. The free tier ($10/day credit grant, closed to new signups since February) went ad-free on March 30 after Amp concluded its ad revenue could not cover frontier token costs. Enterprise adds a 50% markup and a $1,000 onboarding fee. Amp's HN engagement is consistently low (1 to 8 points per submission). The r/AmpCode subreddit (599 weekly visitors, 4 weekly contributions) reveals recurring pain points: free tier credits expire after inactivity ("Don't go on vacation if you use Amp Free," 10 upvotes, 13 comments), missing credits are reported ("No amp free credits?," 2 upvotes), and bash command failures affect usability ("Ampcode broken for me," 3 upvotes, 4 comments).
Risks: free tier closed with no reopening date, per-token rates not published (users must check after the fact), enterprise markup calculation is vague, Reddit community reports stability issues with bash commands.
Augment Code
Augment added Gemini 3.1 Pro and Kimi K2.6 to its model lineup this month, formalizing a multi-provider strategy across Anthropic, OpenAI, Google, and Moonshot AI. Claude Opus 4.7 became the default model on April 16 with a 50% credit discount through April 30. The credit system uses abstract units that cannot be converted to actual tokens, making cost comparisons to raw API pricing impossible. Opus 4.5, 4.6, and 4.7 all cost the same 488 credits per task despite capability differences. The top Reddit post this month is from a user switching to Claude Code (43 upvotes, 41 comments). Intent, Augment's agent orchestration workspace, remains macOS-only with no ETA for Windows or Linux. Standard and Max plans cap at 20 users. Augment published a notable AGENTS.md effectiveness study finding that the best AGENTS.md files deliver quality improvements equivalent to upgrading from Haiku to Opus.
Risks: credit opacity prevents cost comparison with alternatives, macOS-only Intent limits team adoption, 20-user cap on paid plans, top community post is a departure announcement.
Tabnine
Tabnine stands out this month for strong enterprise adoption signals. A sysadmin reported deploying Tabnine across 220 developers after 10 months on Copilot, with completions now following internal patterns after two weeks of Context Engine indexing. A 300-developer org reported acceptance rates improving from 28% to 41% after switching. An 85-developer .NET team reported the Context Engine learned full CQRS pipelines within a week. The common theme: "A less capable model that understands your codebase outperforms a more capable model that doesn't." Pricing starts at $39/user/month (Code Assistant) and $59/user/month (Agentic Platform), with unlimited usage when you bring your own LLM endpoint. Tabnine-provided LLM access adds a 5% handling fee. Headless CI/CD agents run $1,200 to $5,000/month. Deployment options include SaaS, VPC, on-premises, and air-gapped, one of only two suppliers offering full air-gapped support.
Risks: no individual developer plan, no published per-model token rates, no published rate limits, Context Engine benchmarks use undisclosed internal methodology.
Moonshot AI (Kimi Code)
Kimi K2.6 launched April 19 with competitive benchmarks: SWE-Bench Pro 58.6% (matching GPT-5.4 at 57.7%), Terminal-Bench 2.0 66.7% (above GPT-5.4 at 65.4%), at API pricing of $0.95/$4.00 per MTok with a 256K context window. That is roughly one-fifth the cost of Claude Opus 4.7. The Agent Swarm architecture supports up to 300 sub-agents across 4,000 coordinated steps. Consumer plans range from $0 (Adagio) to $199/month (Vivace). However, the billing system has documented problems: multiple reports of double-charging after cancellation, no invoice system, no visible cancel subscription link, and subscription state not syncing across devices. All K2 series models will be discontinued May 25, requiring migration to K2.6. Kimi K2.5 is available for free in Windsurf and was confirmed as the base for Cursor's Composer 2 model.
Risks: billing reliability issues make it risky for enterprise use, agent quotas are approximate with no concrete token counts, rate limiting (429 errors) is frequent during peak usage, K2 models deprecated May 25.
Alibaba (Qwen Code)
Qwen3.6-Plus (April 2) and Qwen3.6-Flash (April 16) launched as general-purpose models, while Qwen3-Coder-Plus remains the code-specific workhorse at approximately $0.56/$2.22 per MTok for inputs under 32K tokens, one of the cheapest code-specific models available. The catch: output pricing jumps 12.5x (from 16 to 200 RMB/MTok) when inputs exceed 256K tokens, a penalty not prominently disclosed that makes large-context agentic workflows expensive. All models support up to 1M context. Qwen3-Coder works with Claude Code via a proxy endpoint, and a CLI tool is available via npm. Qwen3.6-Max-Preview launched at $1.25/$7.50 per MTok but with no published benchmarks. The Qwen blog migrated to qwen.ai with low discoverability, and the old blog has not been updated since July 2025.
Risks: tiered pricing penalizes large contexts severely, no benchmarks published for Qwen3.6 models, no first-party IDE product, pricing in RMB with no USD conversion.
Zhipu AI
GLM-5.1 launched April 8 at $1.40/$4.40 per MTok (input/output), cheaper than Claude Opus 4.7 ($5/$25) and GPT-5.4 ($2.50/$10), with SWE-Bench Pro at 58.4%, comparable to GPT-5.4 (57.7%) and above Claude Opus 4.6 (53.4%). Its standout feature is 8-hour continuous autonomous task execution, demonstrated by building a complete Linux desktop system and optimizing a vector database to 6.9x throughput over 655 rounds of iteration. The GLM Coding Plan supports Claude Code, Cursor, Cline, and other tools via an Anthropic-compatible proxy. Plan prices were doubled in April to $18/month (Lite), $72/month (Pro), and $160/month (Max), with a 10% discount for quarterly billing. GLM-5.1 consumes 3x quota during peak hours (14:00 to 18:00 UTC+8), with a promotional 1x off-peak multiplier expiring at the end of June. Plans are in "short-term sales restriction" with daily inventory limits released at 10:00 UTC+8. Cached input pricing is $0.26/MTok.
Risks: peak multiplier increases costs after promotional period ends in June, capacity-constrained sales with daily inventory, prompt counts in plan limits are approximate (each prompt triggers 15 to 20 model calls).
DeepSeek
DeepSeek V4 launched April 24 with two models. V4-Flash (284B total, 13B active, MoE) at $0.14/$0.28 per MTok is the cheapest frontier-capable model across all 17 suppliers, with 1M context and 384K max output. V4-Pro (1.6T total, 49B active, MoE) is at promotional pricing of $0.435/$0.87 per MTok (75% discount), reverting to $1.74/$3.48 on May 31. Even at full price, V4-Pro is roughly 3x cheaper than Claude Sonnet 4.6 ($3/$15). The Anthropic-compatible endpoint at api.deepseek.com/anthropic enables direct use with Claude Code. Cache hit prices were reduced to 1/10 of launch prices on April 26. The HN launch thread reached 2,086 points and 1,601 comments, one of the largest AI threads of 2026. The tradeoff: no SLAs, no fixed rate limits, and a history of outages during demand spikes.
Risks: V4-Pro promotional pricing expires May 31 (4x price increase), no published rate limits, dynamic concurrency with no guarantees, thinking mode token billing undocumented.
Cerebras (Cerebras Code)
Cerebras runs open-source models (Llama, Qwen, GLM, GPT OSS) on wafer-scale chips claiming 20x faster throughput than GPU-based providers. The speed advantage is real: the Developer API lists concrete throughput of 1,000 to 3,000 tokens/second per model. For agentic coding where thinking time dominates generation speed, the practical impact is limited. Developer API pricing is now published: GPT OSS 120B at $0.35/$0.75 per MTok (3,000 tok/s), Qwen 3 235B Instruct at $0.60/$1.20 per MTok (1,400 tok/s), ZAI GLM 4.7 at $2.25/$2.75 per MTok (1,000 tok/s), and Llama 3.1 8B at $0.10/$0.10 per MTok (2,200 tok/s). GLM 4.7 and GPT OSS 120B are preview models not intended for production. Llama 3.1 8B and Qwen 3 235B Instruct will be deprecated May 27. Cerebras Code Pro ($50/month, 24M tokens/day) and Max ($200/month, 120M tokens/day) remain sold out with no restock ETA. The implied Code Pro per-token rate, if fully utilized at 720M tokens/month, works out to roughly $0.07/MTok, well below API rates.
Risks: Code product sold out indefinitely, two API models deprecating May 27, GLM 4.7 and GPT OSS 120B are preview-only, rate limits for Free and Developer tiers are undisclosed.
xAI
Grok 4.20 is xAI's sole model for both chat and coding, available via API at $2.00/$6.00 per MTok (input/output) with a 2M token context window, the largest in this report. Requests exceeding 200K context tokens are billed at 2x ($4.00/$12.00 per MTok). Cached input tokens cost $0.20/MTok. Server-side tools (web search, X search, code execution) cost $5 per 1,000 invocations on top of token costs. Rate limits are 1,800 RPM and 10M TPM. Grok 4.1 Fast is available via Google Vertex AI at $0.20/$0.50 per MTok. Previous models (Grok 3, Grok Code Fast 1) appear consolidated into Grok 4.20 with no formal deprecation notice. Batch API is offered at 50% off standard rates. There is no IDE integration, CLI coding agent, or subscription plan: xAI is API-only via console.x.ai.
Risks: no IDE integration or CLI coding agent, server-side tool costs are unpredictable in agentic workflows, previous models deprecated without notice, enterprise plan details undisclosed.
Google (Gemini Code Assist)
Gemini Code Assist's free tier offers 6,000 code-related requests/day and 240 chat requests/day, the highest published free-tier limit across all 17 suppliers. IP indemnification is included at all tiers, including free. Gemini 3.1 Pro achieves 80.6% on SWE-bench Verified and 68.5% on Terminal-Bench 2.0, competitive with Claude Opus 4.7 (69.4% on Terminal-Bench). API pricing on Vertex AI is now fully published: Gemini 3.1 Pro at $2.00/$12.00 per MTok (input/output), doubling to $4.00/$18.00 above 200K context. Gemini 3 Flash at $0.50/$3.00 per MTok. Gemini 3.1 Flash-Lite at $0.25/$1.50 per MTok. Cached input is 10% of the standard rate for all models. Google also resells partner models on Vertex AI: Llama 4 Scout at $0.25/$0.70, Mistral Medium 3 at $0.40/$2.00, Codestral 2 at $0.30/$0.90, and Grok 4.20 at $2.00/$6.00. Business plans start at $19/user/month (annual commitment). Code customization (private codebase indexing, comparable to Tabnine's Context Engine) requires the Enterprise tier at $45/user/month, which costs more than GitHub Copilot Enterprise ($39) and Cursor Teams ($40). Context windows for Gemini 3 Flash and 3.1 Pro are still not published.
Risks: code customization locked behind the most expensive tier, context windows for latest models undisclosed, Gemini Cloud Assist pricing unknown after preview ends.
Mistral AI (Mistral Vibe)
Mistral offers the cheapest managed Pro tier at $14.99/month, below Anthropic ($20), OpenAI ($20), and Cursor ($20). The Vibe CLI coding agent is included with Pro. All usage limits are multiples of an undisclosed Free baseline ("Up to 6x Free," "Up to 30x Free"), making actual capacity unknowable without a trial. Direct API pricing is now fully published on Mistral's pricing page: coding-relevant models span from Mistral Small 4 at $0.15/$0.60 per MTok, to Codestral at $0.30/$0.90, to Devstral 2 at $0.40/$2.00, to Mistral Large 3 at $0.50/$1.50, to Mistral Medium 3.5 at $1.50/$7.50. Batch API gives 50% off all models. Enterprise Priority Tier is priced starting at 75% above list pricing, with regional data processing controls and system-level SLAs. Mistral Vibe for IDE is Enterprise-only. Multiple legacy models retire May 31. Mistral's open-weight models (Devstral 2, Mistral Large 3, Mistral Small 4) enable self-hosting, one of the few suppliers where you can run coding agents on your own infrastructure without a commercial license.
Risks: zero concrete usage numbers, IDE agent is Enterprise-only, model lineup is complex with overlapping naming, no published benchmarks for any coding-relevant model.
Meta
Meta does not offer a coding agent product. Llama 4 Scout and Maverick are open-weight models with 10M token context windows, deployed through third-party inference providers. Concrete pricing is now available: Llama 4 Scout at $0.25/$0.70 per MTok via Google Vertex AI is the cheapest model in this report with frontier-class capabilities. Llama 4 Maverick at $0.35/$1.15 per MTok via Vertex AI offers stronger benchmarks (LiveCodeBench 43.4 vs Scout's 32.8). Llama 3.1 8B at $0.10/$0.10 per MTok via Cerebras is the absolute cheapest option, but deprecates May 27. Batch pricing via Vertex AI is 50% off standard rates for all models. Meta's Llama website migrated from llama.meta.com to llama.com (the old URL returns HTTP 400). Without a first-party IDE, CLI, or web product, users must build agent infrastructure using tools like Continue, Cline, Aider, or OpenHands. Llama is the right choice for organizations that need full control over data and deployment (defense, healthcare, finance with strict data residency), but the integration effort is significant compared to managed alternatives.
Risks: no first-party coding agent, inference costs set entirely by third parties with no Meta guidance, self-hosting hardware requirements undisclosed.