AI Inference: NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Llama 4 Maverick

NVIDIA said it has achieved a record for large language model inference, announcing that an NVIDIA DGX B200 node with eight Blackwell GPUs achieved more than 1,000 tokens ….

Multiverse Says It Compresses Llama Models by 80%

Multiverse Computing today released two new AI models compressed by CompactifAI, Multiverse’s AI compressor: 80 percent compressed versions of Llama 3.1-8B and Llama 3.3-70B. Both models have 60 percent fewer parameters than the original models, 84 percent greater energy effi ciency ….

Webinar: Getting Started with Llama 3 on AMD Radeon and Instinct GPUs

[Sponsored Post] This webinar: “Getting Started with Llama 3 on AMD Radeon and Instinct GPUs” provides a guide to installing Hugging Face transformers, Meta’s Llama 3 weights, and the necessary dependencies for running Llama locally on AMD systems with ROCm™ 6.0.

Aware’s AI Data Platform Dominates in Head-to-Head Showdown Against Meta’s Llama-2

Aware, the AI Data Platform for workplace conversations, unveiled its performance against Meta’s latest release, Llama-2. As the tech world buzzes about Llama-2, Aware has conducted a benchmark test to compare the accuracy and cost-effectiveness of its AI models to Meta’s large language model. The results demonstrate that Aware’s purpose-built platform for workplace conversations outshines industry-leading large language models with remarkable accuracy and speed, all at a fraction of the cost