Cerebras Reports Fastest DeepSeek R1 Distill Llama 70B Inference

Cerebras Systems today announced what it said is record-breaking performance for DeepSeek-R1-Distill-Llama-70B inference, achieving more than 1,500 tokens per second – 57 times faster than GPU-based solutions. Cerebras said this speed enables instant reasoning capabilities ….

Cerebras Wafer-Scale Cluster Brings Push-Button Ease and Linear Performance Scaling to Large Language Models

Cerebras Systems, a pioneer in accelerating artificial intelligence (AI) compute, unveiled the Cerebras Wafer-Scale Cluster, delivering near-perfect linear scaling across hundreds of millions of AI-optimized compute cores while avoiding the pain of the distributed compute. With a Wafer-Scale Cluster, users can distribute even the largest language models from a Jupyter notebook running on a laptop with just a few keystrokes. This replaces months of painstaking work with clusters of graphics processing units (GPU).