AI Inference: NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Llama 4 Maverick

NVIDIA said it has achieved a record for large language model inference, announcing that an NVIDIA DGX B200 node with eight Blackwell GPUs achieved more than 1,000 tokens ….

Multiverse Says It Compresses Llama Models by 80%

Multiverse Computing today released two new AI models compressed by CompactifAI, Multiverse’s AI compressor: 80 percent compressed versions of Llama 3.1-8B and Llama 3.3-70B. Both models have 60 percent fewer parameters than the original models, 84 percent greater energy effi ciency ….