Key Terms
- Wafer-scale inference - Cerebras runs inference on CS-3 wafer-scale chips instead of GPUs, claiming 20x faster throughput than GPU-based providers. Source: Cerebras – Pricing
- Cerebras Code - a subscription coding product (separate from the API) providing token-based access to open-source models for IDE integrations and agentic workflows. Source: Cerebras – Introducing Cerebras Code
- Pay-per-token - the Developer API tier charges per token processed, similar to other inference providers. Source: Cerebras – Cerebras Inference Now Available Via Pay Per Token
- Inference provider - Cerebras does not train its own models. It runs third-party open-source models (GLM, GPT OSS, Llama, Qwen) on its hardware at high speed. Source: Cerebras – Pricing
- Preview model - a model available on the platform for evaluation only, not recommended for production workloads. GLM 4.7 and GPT OSS 120B are currently in preview. Source: Cerebras – Pricing
Latest Changes
First report for this supplier. All models, plans, and pricing are listed as current state.
- Deprecation (upcoming): Llama 3.1 8B and Qwen 3 235B Instruct will be deprecated May 27, 2026.
- Feature added: Blog posts on multi-agent workflows and Figma multi-agent integration (April 16).
- Feature added: MCP vs. CLI speed debate blog post (April 6).
- Plan change: Cerebras Code Pro ($50/mo) and Max ($200/mo) remain sold out with no restock ETA.
- Feature added: Cerebras available through AWS Marketplace (March 13).
Plans
Inference API Tiers
| Plan | Price | Usage | Key Inclusions |
|---|---|---|---|
| Free | $0 | Undisclosed rate limits | Access to all models, community support via Discord |
| Developer | Starting at $10 (self-serve payment) | 10x higher rate limits than Free | Everything in Free, higher priority processing |
| Enterprise | Custom (contact sales) | Highest rate limits | Dedicated queue, custom model weights, model fine-tuning, dedicated support |
Cerebras Code (Coding Agent Product)
| Plan | Price | Usage | Key Inclusions |
|---|---|---|---|
| Pro | $50/month | 24M tokens/day ($48/day value) | For indie devs and simple agentic workflows. Sold out as of April 2026 |
| Max | $200/month | 120M tokens/day ($240/day value) | For full-time development. Sold out as of April 2026 |
Source: Cerebras – Pricing
API Pricing
Developer API Per-Token Pricing
| Model | Speed | Input ($/MTok) | Output ($/MTok) | Notes |
|---|---|---|---|---|
| ZAI GLM 4.7 | ~1,000 tok/s | $2.25 | $2.75 | Preview model, not for production |
| OPENAI GPT OSS 120B | ~3,000 tok/s | $0.35 | $0.75 | Preview model, not for production |
| META Llama 3.1 8B | ~2,200 tok/s | $0.10 | $0.10 | Deprecated May 27, 2026 |
| QWEN Qwen 3 235B Instruct | ~1,400 tok/s | $0.60 | $1.20 | Deprecated May 27, 2026 |
Implied Code product pricing: Cerebras Code Pro provides 24M tokens/day at $50/month, which works out to roughly $0.07/1M tokens if fully utilized (720M tokens/month). This is significantly below API rates, suggesting the Code product bundles a subsidized allocation.
Source: Cerebras – Pricing
Model Performance / Benchmarks
Cerebras does not train its own models and does not publish benchmark scores. It hosts third-party open-weight models on wafer-scale chips. Key performance claim:
- 1,000 to 3,000 tokens/second generation speed (confirmed by published pricing page)
- 20x faster throughput than GPU-based providers (claimed)
The practical impact for agentic coding, where thinking time often dominates generation speed, is debated in the community. Speed advantage is most relevant for high-throughput tasks where token generation is the bottleneck.
Source: Cerebras – Pricing
Latest News
Lessons Learned from Building Multi-Agent Workflows (April 16, 2026)
Blog post sharing engineering insights from deploying multi-agent systems on Cerebras infrastructure. Covers practical challenges in orchestrating multiple AI agents for production workloads.
Source: Cerebras – Lessons Learned From Building Multi Agent Workflows
Figma MultiAgents (April 16, 2026)
Blog post about a multi-agent integration with Figma, demonstrating how fast inference enables real-time collaborative AI-assisted design workflows.
Source: Cerebras – Figma Multiagents
The Debate of MCP vs. CLI Centers on Speed (April 6, 2026)
Blog post arguing that the choice between MCP (Model Context Protocol) servers and CLI-based agent interfaces comes down to inference speed. Cerebras positions its speed advantage as critical for CLI-based agentic workflows where round-trip latency compounds.
Source: Cerebras – Mcpvscli
Cerebras Coming to AWS (March 13, 2026)
Cerebras announced availability through AWS Marketplace, allowing customers to access Cerebras inference through AWS with flexible pricing and low-latency workloads. This is a distribution expansion, not a new product.
Source: Cerebras – Cerebras Is Coming To Aws
Introducing Cerebras Code (August 1, 2025)
Cerebras Code launched as a subscription coding product with Pro ($50/month) and Max ($200/month) tiers. Both tiers are currently sold out.
Source: Cerebras – Introducing Cerebras Code
Qwen3 Coder 480B Live on Cerebras (August 1, 2025)
Qwen3 Coder 480B model made available on Cerebras inference, a key coding-focused model in their lineup.
Source: Cerebras – Qwen3 Coder 480B Is Live On Cerebras
Community Signals
- Cerebras Code Pro and Max tiers have been sold out for an extended period. No ETA for reopening has been communicated. This limits access to the most cost-effective coding plans.
- Cerebras is frequently mentioned in discussions about inference speed, with community benchmarks confirming 10-20x faster token generation compared to GPU-based providers. However, the practical impact for coding tasks (where thinking time dominates over generation speed) is debated.
- The OpenAI Codex Spark partnership (February 2026) generated significant interest, as it positions Cerebras as the fast-inference backend for an OpenAI product.
Enterprise Readiness
| Feature | Available? | Details |
|---|---|---|
| SSO (SAML) | No | Not mentioned. Enterprise plan requires contacting sales. |
| SSO (OIDC) | No | Not mentioned. |
| SCIM | No | Not mentioned. |
| Audit logs | No | Not mentioned. |
| IP indemnity | No | Not mentioned. Cerebras is an inference provider, not a model creator. |
| Data residency | No | Not mentioned. |
| HIPAA | No | Not mentioned. |
| Air-gapped / on-prem | No | Not available. Cerebras requires its proprietary wafer-scale hardware. |
| SLA | No | No published SLA. |
| Admin controls (RBAC) | No | No admin controls documented. |
Transparency Gaps
| Metric | Status | Notes |
|---|---|---|
| Free tier rate limits | undisclosed | No published requests/minute or tokens/day |
| Developer tier rate limits | undisclosed | Described as "10x higher than Free" with no absolute numbers |
| Enterprise pricing | undisclosed | Contact sales required |
| Cerebras Code restock timeline | undisclosed | Pro and Max have been sold out with no announced reopening date |
| Preview model GA timeline | undisclosed | GLM 4.7 and GPT OSS 120B are marked as preview with no stated production readiness date |
| Code product model allocation | undisclosed | Pro and Max plans do not specify which models are available or if model selection is restricted |