Meta

AI Coding Agents Report: April 2026 · Updated 01 May 2026 · Version history

Executive Summary

What it is: Meta does not offer a coding agent product. Llama 4 Scout and Llama 4 Maverick are open-weight models that can be downloaded for free and deployed through third-party inference providers (Cerebras, Google Vertex AI, Together AI, Fireworks AI, AWS Bedrock, Azure AI) or self-hosted on your own GPU infrastructure. Inference costs are set entirely by the hosting provider, ranging from $0.10/MTok (Cerebras, Llama 3.1 8B) to $1.15/MTok output (Google Vertex AI, Llama 4 Maverick).

What to watch out for: Meta's Llama website moved from llama.meta.com to https://www.llama.com (the old URL returns HTTP 400 errors). There is no first-party IDE, CLI, or web coding product, so users must build their own agent infrastructure using tools like Continue, Cline, Aider, or OpenHands. Llama 3.1 8B on Cerebras is deprecating on May 27, 2026. Self-hosting requires GPU infrastructure with no official hardware guidance from Meta.

Bottom line: Llama models are the right choice when you need full control over data, model weights, and deployment (defense, healthcare, finance use cases with strict data residency requirements). For everyone else, the lack of a first-party coding agent means significant integration effort compared to managed options like Claude Code, Copilot, or Cursor. Llama 4 Maverick scores 43.4 on LiveCodeBench, competitive with GPT-4-class models, but without a managed product it serves as an infrastructure choice rather than a tooling choice.

Key Terms

Open-weight model - a model whose trained weights are published for anyone to download, modify, and deploy. Meta's Llama models are open-weight, meaning the cost of inference depends entirely on which hosting provider you choose. Source: Llama
Inference provider - a third-party service (e.g., Cerebras, Together AI, Groq, AWS Bedrock, Azure AI) that hosts Llama models and charges for API access. Meta does not sell inference directly. Source: Llama
Distributed inference - running inference across multiple GPUs or nodes to serve large models. Llama 4 Scout and Maverick both support distributed inference, with estimated costs of $0.19-$0.49/MTok. Source: Llama

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

New model: Llama 4 Scout and Maverick available as open-weight models. Both feature 10M token context windows and native multimodal support.
Deprecation (upcoming): Cerebras deprecating Llama 3.1 8B on May 27, 2026.
Plan change: Website migrated from llama.meta.com to Llama Old URL returns HTTP 400 errors.

Plans

Meta does not offer a coding agent product or subscription plans. Llama models are open-weight and free to download. The cost structure depends entirely on how you deploy them.

Deployment Method	Cost	Notes
Download and self-host	Free (hardware costs only)	Requires GPU infrastructure. Cost depends on your hardware and electricity
Third-party inference API	Varies by provider	See API Pricing table below for concrete per-provider rates
Cerebras (fast inference)	See API Pricing table	Pay-per-token via Cerebras API or Cerebras Code subscription
AWS Bedrock / Azure AI	See respective provider pricing	Pay-per-token through cloud marketplace
Together AI / Fireworks AI	See respective provider pricing	Competitive pricing for open-source model inference
Groq	See Groq pricing	Fast inference on LPU hardware

Source: Llama

API Pricing

Meta does not offer an API directly. Inference costs are set by third-party providers. Concrete pricing from Google Vertex AI and Cerebras:

Model	Provider	Input ($/MTok)	Output ($/MTok)	Batch Input ($/MTok)	Batch Output ($/MTok)	Notes
Llama 4 Scout	Google Vertex AI	$0.25	$0.70	$0.125	$0.35
Llama 4 Maverick	Google Vertex AI	$0.35	$1.15	$0.175	$0.575
Llama 3.3 70B	Google Vertex AI	$0.72	$0.72	$0.36	$0.36
Llama 3.1 8B	Cerebras	$0.10	$0.10	-	-	Deprecating May 27, 2026

Terms explained:

Batch API - a lower-cost inference mode where requests are queued and processed asynchronously (not real-time), typically at 50% of standard pricing. Google Vertex AI offers batch pricing for all Llama models listed above. Source: Google – Pricing

Source: Llama Google – Pricing, Cerebras – Pricing

Model Performance / Benchmarks

Benchmark	Llama 4 Maverick	Llama 4 Scout
MMLU Pro	80.5	74.3
LiveCodeBench	43.4	32.8

Additional specifications:

Both models: 10M token context window, text + image (natively multimodal)
Llama 4 Scout: optimized for efficient inference on a single H100 GPU
Llama 4 Maverick: targets frontier-level performance with higher resource requirements
Estimated distributed inference cost: $0.19-$0.49/MTok

Source: Llama

Latest News

Llama 4 Scout and Maverick Release

Llama 4 Scout and Llama 4 Maverick are available as open-weight models. Both feature 10M token context windows and native multimodal support (text + image). Llama 4 Maverick achieves 80.5 MMLU Pro and 43.4 LiveCodeBench. Llama 4 Scout achieves 74.3 MMLU Pro and 32.8 LiveCodeBench. Both can be downloaded from Llama and are supported by major inference providers including Cerebras, Google Vertex AI, Together AI, Fireworks AI, AWS Bedrock, and Azure AI.

Cerebras Deprecation of Llama 3.1 8B

Cerebras is deprecating Llama 3.1 8B on its platform effective May 27, 2026. The model was priced at $0.10/MTok input and $0.10/MTok output. Users should migrate to Llama 4 Scout or Maverick before the deprecation date.

Website Migration

Meta's Llama website has migrated from llama.meta.com to Llama The old URL returns HTTP 400 errors. All model downloads and documentation are now hosted at the new domain.

Source: Llama Cerebras – Pricing

Community Signals

LiveCodeBench and Context Window Discussion

Llama 4 Maverick's LiveCodeBench score of 43.4 is frequently cited in coding benchmark discussions, with community members comparing it favorably to GPT-4-class models for code generation tasks. The 10M token context window is a major talking point, with developers noting it enables processing very large codebases in a single prompt. However, practical latency and cost at that context length are still being evaluated by the community.

Self-Hosting for Privacy

The lack of a first-party coding agent product from Meta means users must build their own agent infrastructure using tools like Continue, Cline, Aider, or OpenHands on top of Llama models. Organizations with strong privacy requirements (defense, healthcare, finance) often choose Llama models for on-premises deployment to avoid sending code to third-party APIs.

Provider Lock-In Concerns

The Cerebras Llama 3.1 8B deprecation has prompted discussion about provider lock-in when relying on a single inference provider for open-weight models.

Source: Llama

Enterprise Readiness

Feature	Available?	Details
SSO (SAML)	N/A	Meta does not offer a managed platform. Models are downloaded and deployed by the user.
SSO (OIDC)	N/A	Same as above.
SCIM	N/A	Same as above.
Audit logs	N/A	Same as above.
IP indemnity	No	Not offered. Models are open-weight with no commercial indemnification from Meta.
Data residency	Yes	Full control when self-hosting. Models can run on any infrastructure in any region.
HIPAA	N/A	Self-hosted deployments can be made HIPAA-compliant by the deploying organization.
Air-gapped / on-prem	Yes	Models can be downloaded and deployed on air-gapped infrastructure. Full data isolation. Source: Llama
SLA	N/A	No managed service. Availability depends on the user's own infrastructure.
Admin controls (RBAC)	N/A	No managed platform. Controls depend on the user's deployment infrastructure.

Transparency Gaps

Metric	Status	Notes
Recommended inference costs	not applicable	Meta does not set inference pricing
Self-hosting hardware requirements	undisclosed	No official guidance on minimum GPU specs for Llama 4 Scout or Maverick
Fine-tuning tools	partially disclosed	Meta provides Llama fine-tuning guides but specifics vary by model size
Cerebras Llama 4 Maverick pricing	undisclosed	Cerebras lists Llama 4 Maverick as supported but has not published per-token pricing
Together AI / Fireworks AI Llama 4 pricing	undisclosed	Pricing pages not updated with Llama 4 per-token rates at time of report
Context window performance at scale	undisclosed	10M token context is claimed but no official latency/cost benchmarks published at that scale

Type: None (open-weight models)
API Input: $0.1/MTok
API Output: $0.1/MTok
Context: 10M
Free Tier: Yes

Compare all suppliers →