Donostia, Spain – April 8, 2025 – Multiverse Computing today released two new AI models compressed by CompactifAI, Multiverse’s AI compressor: 80 percent compressed versions of Llama 3.1-8B and Llama 3.3-70B.
Both models have 60 percent fewer parameters than the original models, 84 percent greater energy effi ciency, 40 percent faster inference, and yield a 50 percent cost reduction without sacrifi cing accuracy, according to Multiverse. “AI developers can immediately plug the models into any application – edge, on-premise, or cloud,” the company said.
Multiverse will release versions of the top LLMs compressed by CompactifAI over the coming months.
“Meta’s Llama 4 launch underscores a major shift in AI: smaller, more powerful, and multimodal models are no longer optional — they’re the new default,” said Dmitry Zakharchenko, chief software office at Blaize, a U.S. edge AI chip company. “As AI moves from cloud to edge, success depends on models that are efficient, affordable, and fully programmable.”
Multiverse said CompactifAI is the fi rst compressor of its kind, using quantum-inspired tensor networks to make AI systems more effi cient and portable, reducing size up to 93 percent with only a 2-3 percent drop in accuracy—an astounding feat when compared to an industry-standard 20-30% accuracy loss with 50-60 percent compression techniques.
“CompactifAI is changing the economics of AI processing and opening up new use cases for AI models,” said Enrique Lizaso Olmos, CEO of Multiverse Computing. “Eff orts to curb unwieldy models have come up short. Our novel approach to compression grounded in quantum-inspired techniques makes it possible to pair performance with processing effi ciency and gives us a massive edge on LLM providers.”
Multiverse Computing was founded in 2019 by pioneers in quantum-inspired software to develop novel solutions to complex business problems. In 2023 the company began applying its core technology to address the AI energy crisis with CompactifAI.
LLM providers have turned to techniques such as pruning and quantization to compress models but have yet to eradicate the tradeoff between size and performance. For instance, Llama3.1-8B Slim by CompactifAI requires 300x fewer training tokens than Meta’s CAI Llama3, and 3x fewer training tokens than Nvidia’s Llama3.1-Minitron while outperforming across benchmarks. For Llama3.3-70B Slim by CompactifAI, comparative benchmarks show an increase in reasoning capabilities while maintaining original precision.
“We’re rapidly delivering compressed versions of the most powerful LLMs in the world,” said Sam Mugel, Chief Technology Offi cer at Multiverse. “The advanced capabilities of these two massive models can now fi t into smartphones, laptops, and cars, or real-world machines like oil rigs and satellites. Our aggressive roadmap to roll out dozens of compressed, leading LLMs could dramatically accelerate the impact of AI in the real world.”