Cerebras

AI Coding Agents Report: April 2026 · Updated 01 May 2026 · Version history

Executive Summary

What it is: Cerebras is an inference provider that runs open-source models (GLM, GPT OSS, Llama, Qwen) on wafer-scale chips, claiming 20x faster throughput than GPU-based providers. Its Cerebras Code product offers subscription-based coding agent access at $50/mo (Pro) and $200/mo (Max), and the Developer API operates on pay-per-token pricing starting at a $10 minimum payment. There is no first-party model; Cerebras hosts third-party open-weight models.

What to watch out for: Cerebras Code Pro and Max have been sold out for an extended period with no restock ETA. Free and Developer tier rate limits are undisclosed (described only as "10x higher than Free" for the paid tier, with no absolute numbers). Llama 3.1 8B and Qwen 3 235B Instruct will be deprecated on May 27, 2026. GLM 4.7 and GPT OSS 120B are preview models not intended for production use.

Bottom line: Cerebras's speed advantage is real (1,000 to 3,000 tok/s confirmed by published pricing page), but the practical impact for agentic coding, where thinking time often dominates generation speed, is limited. With the Code product sold out, two models pending deprecation, and rate limits opaque, Cerebras is difficult to evaluate for production use. Best suited as a low-latency inference backend for specific high-throughput tasks where token generation speed is the bottleneck.

Key Terms

Wafer-scale inference - Cerebras runs inference on CS-3 wafer-scale chips instead of GPUs, claiming 20x faster throughput than GPU-based providers. Source: Cerebras – Pricing
Cerebras Code - a subscription coding product (separate from the API) providing token-based access to open-source models for IDE integrations and agentic workflows. Source: Cerebras – Introducing Cerebras Code
Pay-per-token - the Developer API tier charges per token processed, similar to other inference providers. Source: Cerebras – Cerebras Inference Now Available Via Pay Per Token
Inference provider - Cerebras does not train its own models. It runs third-party open-source models (GLM, GPT OSS, Llama, Qwen) on its hardware at high speed. Source: Cerebras – Pricing
Preview model - a model available on the platform for evaluation only, not recommended for production workloads. GLM 4.7 and GPT OSS 120B are currently in preview. Source: Cerebras – Pricing

Latest Changes

First report for this supplier. All models, plans, and pricing are listed as current state.

Deprecation (upcoming): Llama 3.1 8B and Qwen 3 235B Instruct will be deprecated May 27, 2026.
Feature added: Blog posts on multi-agent workflows and Figma multi-agent integration (April 16).
Feature added: MCP vs. CLI speed debate blog post (April 6).
Plan change: Cerebras Code Pro ($50/mo) and Max ($200/mo) remain sold out with no restock ETA.
Feature added: Cerebras available through AWS Marketplace (March 13).

Plans

Inference API Tiers

Plan	Price	Usage	Key Inclusions
Free	$0	Undisclosed rate limits	Access to all models, community support via Discord
Developer	Starting at $10 (self-serve payment)	10x higher rate limits than Free	Everything in Free, higher priority processing
Enterprise	Custom (contact sales)	Highest rate limits	Dedicated queue, custom model weights, model fine-tuning, dedicated support

Cerebras Code (Coding Agent Product)

Plan	Price	Usage	Key Inclusions
Pro	$50/month	24M tokens/day ($48/day value)	For indie devs and simple agentic workflows. Sold out as of April 2026
Max	$200/month	120M tokens/day ($240/day value)	For full-time development. Sold out as of April 2026

Source: Cerebras – Pricing

API Pricing

Developer API Per-Token Pricing

Model	Speed	Input ($/MTok)	Output ($/MTok)	Notes
ZAI GLM 4.7	~1,000 tok/s	$2.25	$2.75	Preview model, not for production
OPENAI GPT OSS 120B	~3,000 tok/s	$0.35	$0.75	Preview model, not for production
META Llama 3.1 8B	~2,200 tok/s	$0.10	$0.10	Deprecated May 27, 2026
QWEN Qwen 3 235B Instruct	~1,400 tok/s	$0.60	$1.20	Deprecated May 27, 2026

Implied Code product pricing: Cerebras Code Pro provides 24M tokens/day at $50/month, which works out to roughly $0.07/1M tokens if fully utilized (720M tokens/month). This is significantly below API rates, suggesting the Code product bundles a subsidized allocation.

Source: Cerebras – Pricing

Model Performance / Benchmarks

Cerebras does not train its own models and does not publish benchmark scores. It hosts third-party open-weight models on wafer-scale chips. Key performance claim:

1,000 to 3,000 tokens/second generation speed (confirmed by published pricing page)
20x faster throughput than GPU-based providers (claimed)

The practical impact for agentic coding, where thinking time often dominates generation speed, is debated in the community. Speed advantage is most relevant for high-throughput tasks where token generation is the bottleneck.

Source: Cerebras – Pricing

Latest News

Lessons Learned from Building Multi-Agent Workflows (April 16, 2026)

Blog post sharing engineering insights from deploying multi-agent systems on Cerebras infrastructure. Covers practical challenges in orchestrating multiple AI agents for production workloads.

Source: Cerebras – Lessons Learned From Building Multi Agent Workflows

Figma MultiAgents (April 16, 2026)

Blog post about a multi-agent integration with Figma, demonstrating how fast inference enables real-time collaborative AI-assisted design workflows.

Source: Cerebras – Figma Multiagents

The Debate of MCP vs. CLI Centers on Speed (April 6, 2026)

Blog post arguing that the choice between MCP (Model Context Protocol) servers and CLI-based agent interfaces comes down to inference speed. Cerebras positions its speed advantage as critical for CLI-based agentic workflows where round-trip latency compounds.

Source: Cerebras – Mcpvscli

Cerebras Coming to AWS (March 13, 2026)

Cerebras announced availability through AWS Marketplace, allowing customers to access Cerebras inference through AWS with flexible pricing and low-latency workloads. This is a distribution expansion, not a new product.

Source: Cerebras – Cerebras Is Coming To Aws

Introducing Cerebras Code (August 1, 2025)

Cerebras Code launched as a subscription coding product with Pro ($50/month) and Max ($200/month) tiers. Both tiers are currently sold out.

Source: Cerebras – Introducing Cerebras Code

Qwen3 Coder 480B Live on Cerebras (August 1, 2025)

Qwen3 Coder 480B model made available on Cerebras inference, a key coding-focused model in their lineup.

Source: Cerebras – Qwen3 Coder 480B Is Live On Cerebras

Community Signals

Cerebras Code Pro and Max tiers have been sold out for an extended period. No ETA for reopening has been communicated. This limits access to the most cost-effective coding plans.
Cerebras is frequently mentioned in discussions about inference speed, with community benchmarks confirming 10-20x faster token generation compared to GPU-based providers. However, the practical impact for coding tasks (where thinking time dominates over generation speed) is debated.
The OpenAI Codex Spark partnership (February 2026) generated significant interest, as it positions Cerebras as the fast-inference backend for an OpenAI product.

Enterprise Readiness

Feature	Available?	Details
SSO (SAML)	No	Not mentioned. Enterprise plan requires contacting sales.
SSO (OIDC)	No	Not mentioned.
SCIM	No	Not mentioned.
Audit logs	No	Not mentioned.
IP indemnity	No	Not mentioned. Cerebras is an inference provider, not a model creator.
Data residency	No	Not mentioned.
HIPAA	No	Not mentioned.
Air-gapped / on-prem	No	Not available. Cerebras requires its proprietary wafer-scale hardware.
SLA	No	No published SLA.
Admin controls (RBAC)	No	No admin controls documented.

Transparency Gaps

Metric	Status	Notes
Free tier rate limits	undisclosed	No published requests/minute or tokens/day
Developer tier rate limits	undisclosed	Described as "10x higher than Free" with no absolute numbers
Enterprise pricing	undisclosed	Contact sales required
Cerebras Code restock timeline	undisclosed	Pro and Max have been sold out with no announced reopening date
Preview model GA timeline	undisclosed	GLM 4.7 and GPT OSS 120B are marked as preview with no stated production readiness date
Code product model allocation	undisclosed	Pro and Max plans do not specify which models are available or if model selection is restricted

Type: API only
Starts at: $50.0/mo
API Input: $0.1/MTok
API Output: $0.1/MTok
Context: N/A
Free Tier: Yes

Compare all suppliers →