Meta

AI Coding Agents Report: May 2026 · Updated 31 May 2026 · Version history

Executive Summary

What it is: Meta develops and releases the Llama family of open-weight language models under a custom commercial license (Llama 4 Community License). Meta does not sell API access, subscription plans, or coding agent software. Users access Llama models through third-party inference providers (OpenRouter, Together AI, Fireworks, Groq, etc.) or self-host the downloaded weights. The two current flagship models are Llama 4 Maverick (400B total/17B active parameters, 128 experts, 1M context) and Llama 4 Scout (109B total/17B active parameters, 16 experts, 10M context), both released April 5, 2025. Third-party API pricing ranges from $0.08/$0.30 per MTok (Scout) to $0.15/$0.60 per MTok (Maverick) on OpenRouter. Source: https://www.llama.com

What to watch out for: Meta does not offer a first-party API, SLA, enterprise features, or IP indemnity. Hosting costs, availability, and data policies depend entirely on which third-party provider you choose. The homepage and model card contradict each other on Maverick's context window (10M on homepage, 1M on model card). The Llama API (llama.developer.meta.com) remains in waitlist mode with no public pricing or timeline. The knowledge cutoff is August 2024, making Llama 4 models over 20 months behind current events.

Bottom line: Llama 4 provides strong open-weight models with excellent third-party API pricing and the flexibility to self-host. For teams that need data sovereignty or want to avoid vendor lock-in, Llama 4 Maverick via a trusted inference provider is a cost-effective alternative to closed-source frontier models. However, the lack of a first-party API, the stale training data, and the absence of any enterprise support from Meta mean that Llama works best as part of a multi-provider strategy rather than a standalone solution.

Key Terms

Open-weight model — model weights are published for download, allowing anyone to run inference, fine-tune, or modify the model locally or on their own infrastructure. Unlike "open source," the Llama 4 Community License imposes restrictions on derivative naming and has a 700M MAU commercial threshold. Source: GitHub – License
Mixture-of-experts (MoE) — architecture where only a subset of parameters are activated per token. Llama 4 always activates 17B parameters per forward pass, but routes through different expert subsets (16 for Scout, 128 for Maverick). This keeps inference cost low despite large total parameter counts. Source: Llama – Llama4
Early fusion — a multimodal training technique where text and vision data are processed together from the first layer, rather than using separate frozen vision encoders. Llama 4 uses early fusion for native multimodality. Source: Llama
Llama 4 Community License — Meta's custom license for Llama 4 models. Permits free commercial use with two conditions: (1) entities with >700M monthly active users must request a separate license from Meta, and (2) derivative models must prefix their name with "Llama" and display "Built with Llama" on related materials. Source: GitHub – License
Llama API — Meta's hosted inference API at llama.developer.meta.com, currently in waitlist mode. Not yet publicly available. Source: Llama

Latest Changes

Changes since the 2026-04 report.

No new models. Llama 4 Scout and Llama 4 Maverick remain the flagship models, released April 5, 2025. No new Llama model has been announced or released since.
No pricing changes. Meta does not sell API access, so there are no pricing changes from Meta. Third-party provider pricing may have shifted independently.
No Llama-specific blog posts in May 2026. The AI at Meta blog's most recent posts are about Muse Spark (April 8, 2026), SAM 3.1 (March 27, 2026), and MTIA chips (March 11, 2026). None relate to Llama. Source: Meta
Llama API remains in waitlist mode. No update on pricing, availability, or capabilities of the first-party Llama API. Source: Llama – Join Waitlist
Llama 4 models not available on Together AI. As of May 31, 2026, Together AI's pricing page lists Llama 3.3 70B ($0.88/$0.88 per MTok) but does not carry Llama 4 Scout or Maverick. Source: Together – Pricing
Llama 4 available on OpenRouter. Scout at $0.08/$0.30 per MTok (10.4B tokens processed in the last 7 days) and Maverick at $0.15/$0.60 per MTok (28.2B tokens processed in the last 7 days). Source: Openrouter – Models

Plans

Meta does not offer subscription plans, tiers, or bundled products. Models are free to download and use under the Llama 4 Community License.

Access Method	Price	Details
Direct download (self-host)	Free	Download weights from Llama – Llama Downloads or HuggingFace. Requires own GPU infrastructure.
Llama API (Meta-hosted)	Undisclosed (waitlist)	Llama – Join Waitlist. No public pricing or availability timeline.
Third-party API providers	Varies by provider	OpenRouter, Fireworks, Groq, DeepInfra, etc. Pricing and SLAs vary.

API Pricing

Meta does not publish API pricing because it does not offer a public API. Third-party providers set their own prices for serving Llama 4 models. All prices below are per 1M tokens.

Provider	Model	Input	Output	Context	Notes
OpenRouter	Llama 4 Scout	$0.08	$0.30	10M	10.4B tokens/week. Source: Openrouter – Llama 4 Scout
OpenRouter	Llama 4 Maverick	$0.15	$0.60	1.05M	28.2B tokens/week. Source: Openrouter – Llama 4 Maverick
Together AI	Llama 4 Scout	N/A	N/A	N/A	Not available. Only Llama 3.3 70B listed. Source: Together – Pricing
Together AI	Llama 4 Maverick	N/A	N/A	N/A	Not available. Source: Together – Pricing
Meta (self-host estimate)	Llama 4 Maverick	$0.19 (blended)	$0.30-$0.49 (blended)	1M	Meta's own cost estimate for distributed inference. Source: Llama

Self-hosting hardware requirements:

Llama 4 Scout: Can run on a single H100 GPU with INT4 quantization (109B total params).
Llama 4 Maverick: Cannot run on a single GPU. FP8 quantized weights fit on a single H100 DGX host (8 GPUs). BF16 weights require multi-host deployment (400B total params). Source: Hugging Face – Llama 4 Maverick 17B 128E

Model Performance / Benchmarks

Meta reports the following benchmarks for instruction-tuned Llama 4 models. All evaluations conducted on BF16 weights with 0-shot, temperature=0. For high-variance benchmarks, Meta averages over multiple generations.

Model	Benchmark	Score	Notes
Llama 4 Maverick	LiveCodeBench (10/01/2024-02/01/2025)	43.4	Pass@1. Source: Llama
Llama 4 Scout	LiveCodeBench (10/01/2024-02/01/2025)	32.8	Pass@1. Source: Llama
Llama 4 Maverick	MMLU Pro	80.5	Macro avg/acc. Source: Llama
Llama 4 Scout	MMLU Pro	74.3	Macro avg/acc. Source: Llama
Llama 4 Maverick	GPQA Diamond	69.8	Accuracy. Source: Llama
Llama 4 Scout	GPQA Diamond	57.2	Accuracy. Source: Llama
Llama 4 Maverick	MMMU	73.4	Accuracy. Multimodal benchmark. Source: Llama
Llama 4 Scout	MMMU	69.4	Accuracy. Source: Llama
Llama 4 Maverick	MathVista	73.7	Accuracy. Source: Llama
Llama 4 Scout	MathVista	70.7	Accuracy. Source: Llama
Llama 4 Maverick	ChartQA	90.0	Relaxed accuracy. Source: Llama
Llama 4 Scout	ChartQA	88.8	Relaxed accuracy. Source: Llama
Llama 4 Maverick	DocVQA	94.4	ANLS. Source: Llama
Llama 4 Scout	DocVQA	94.4	ANLS. Source: Llama
Llama 4 Maverick	MMLU Multi	84.6	Source: Llama
Llama 4 Maverick	MBPP (pretrained)	77.6	Pass@1 (3-shot). Source: Hugging Face – Llama 4 Maverick 17B 128E
Llama 4 Scout	MBPP (pretrained)	67.8	Pass@1 (3-shot). Source: Hugging Face – Llama 4 Maverick 17B 128E

Context for comparison: Llama 4 Maverick's LiveCodeBench score of 43.4 is lower than DeepSeek V4 Pro (undisclosed exact score, community-rated "close to Opus 4.5") and well below current closed-source frontier models. However, at $0.15/$0.60 per MTok via OpenRouter, Maverick offers strong performance-per-dollar.

Latest News

2026-04-08: Muse Spark announced. Meta's AI at Meta blog published "Introducing Muse Spark: Scaling Towards Personal Superintelligence." This is not directly Llama-related but represents Meta's broader AI direction. No Llama model updates were included. Source: Meta – Introducing Muse Spark Msl

2026-03-27: SAM 3.1 released. Meta released Segment Anything Model 3.1 for real-time video detection and tracking. Not Llama-related, but demonstrates continued investment in open-source AI tooling. Source: Meta – Segment Anything Model 3

2026-03-11: Four MTIA Chips in Two Years. Meta detailed its custom AI chip roadmap. Relevant because MTIA chips are designed to serve Llama models at scale, potentially reducing Meta's inference costs and enabling the Llama API. Source: Meta – Meta Mtia Scale Ai Chips For Billions

No Llama 4 Behemoth update. When Llama 4 launched in April 2025, Meta mentioned a larger "Llama 4 Behemoth" model was in training. As of May 2026, there has been no update on Behemoth's status or release timeline.

No Llama 4.1 or Llama 5 announcement. Meta has not announced any successor to the Llama 4 series. The models are now over 13 months old.

Community Signals

Llama 4 adoption on OpenRouter is moderate but growing. Llama 4 Maverick processes 28.2B tokens/week on OpenRouter (second among Llama models, behind Llama 3.1 8B at 83.3B tokens/week). Llama 4 Scout processes 10.4B tokens/week. For comparison, DeepSeek V4 Pro processes significantly more volume through its direct API. Source: Openrouter – Models

Llama 4 not prominently discussed in May 2026 HN threads. The dominant community conversations in May 2026 focused on DeepSeek V4 Pro's permanent pricing, DeepClaude (using DeepSeek with Claude Code), and coding agent competition. Llama 4 was mentioned in passing as an open-weight alternative but was not the center of any major discussion thread. This suggests Llama 4 has settled into a stable "commodity" position in the community rather than being a hot topic.

OpenCode integration. Multiple HN commenters in the DeepClaude thread mentioned using OpenCode as a harness for Llama and other open-weight models. User rurban said: "I'm working with Deepseek for a few weeks with opencode, and there are no desires left." While this specifically references DeepSeek, it validates the opencode harness pattern for open-weight models including Llama 4. Source: News – Item

Llama 4 Scout praised for local deployment. Community members on r/LocalLLaMA and HN have noted that Scout's ability to run on a single H100 (with INT4 quantization) makes it one of the most capable models that can be deployed without multi-GPU infrastructure. However, the 10M context window claim has been met with skepticism, as it requires 512 GPUs with 5D parallelism in Meta's benchmark configuration.

Knowledge cutoff is a recurring concern. With an August 2024 knowledge cutoff, Llama 4 models are over 20 months behind current events. Community members have flagged this as a significant limitation for coding tasks that involve recently released libraries, APIs, or frameworks.

Enterprise Readiness

Feature	Available?	Details
SSO (SAML/OIDC)	N/A	Meta does not offer a hosted product with authentication. Self-hosting or third-party providers handle auth.
SCIM	N/A	No hosted product.
Audit logs	N/A	No hosted product. Self-hosting users implement their own.
IP indemnity	No	The Llama 4 Community License disclaims all warranties (Section 3) and limits liability (Section 4). Meta provides no IP indemnity for Llama model outputs.
Data residency	Partial	Self-hosting provides full data residency control. Third-party providers vary: OpenRouter routes through multiple providers; check individual provider policies. Meta's license does not address data processing.
HIPAA	N/A	No hosted product to certify. Self-hosting may be HIPAA-compliant with proper infrastructure controls.
Air-gapped / On-prem	Yes	Models are downloadable and can run fully offline. Scout runs on 1x H100 with INT4; Maverick requires multi-GPU setup.
SLA	No	Meta provides no SLA for model availability, performance, or support.
Admin controls (RBAC)	N/A	No hosted product.

Source: GitHub – License ; Llama

Transparency Gaps

Context window contradiction for Maverick. The llama.com homepage claims Maverick has "10M-token context for long-form work," but the official model card documentation page states "1M tokens" as the maximum context length. OpenRouter lists 1.05M context. The homepage claim appears to be marketing copy that does not match the technical specification. Source: Llama vs Llama – Llama4

Llama API status and pricing are undisclosed. Meta launched a Llama API waitlist at llama.developer.meta.com, but has not published pricing, rate limits, capabilities, or a general availability timeline. The waitlist has been open since at least April 2025 with no update.

Llama 4 Behemoth status unknown. Meta mentioned a larger "Behemoth" model during the Llama 4 launch in April 2025. Thirteen months later, there has been no update on its status, capabilities, or release plans.

Training data composition is vague. Meta states Llama 4 was trained on "a mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI." The exact composition, data filtering methodology, and opt-out mechanisms are not disclosed. Source: Hugging Face – Llama 4 Maverick 17B 128E

Knowledge cutoff is 20+ months old. Llama 4 models were trained on data with an August 2024 cutoff. Meta has not announced any plans for an updated model or continued pretraining on fresher data.

No benchmark methodology disclosure beyond summary. Meta reports benchmark results as single numbers but does not publish full evaluation code, prompts, or raw results. The methodology note on the homepage states "0 shot evaluation with temperature = 0" and "we average over multiple generations" for high-variance benchmarks, but the number of generations and confidence intervals are not disclosed.

Self-hosting cost estimates lack detail. Meta estimates serving Maverick at $0.19-$0.49 per MTok (blended, 3:1 ratio), but does not disclose the assumptions behind these estimates (GPU utilization, batch size, hardware depreciation, power cost, etc.). Source: Llama

Type: API (third-party), Self-host
API Input: $0.08/MTok
API Output: $0.3/MTok
Context: 10M
Free Tier: Yes

Compare all suppliers →