OpenAI will integrate 750 megawatts of ultra-low latency computing from the chipmaker Cerebral to accelerate the response time of its artificial intelligence (AI) models.
The capacity resulting from the agreement will be put into service in several stages, starting this year and lasting until 2028, according to press releases issued by the companies on Wednesday (January 14).
For users of OpenAI’s AI models, adding this calculation will provide real-time answers as the AI answers difficult questions, generates code, creates images. Or running AI agents, OpenAI said in its press release.
“OpenAI’s compute strategy is about creating a resilient portfolio that matches the right systems with the right workloads. » Sachin Katticomputing infrastructure at OpenAI, said in the release. “Cerebras adds a dedicated, low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation for scaling real-time AI to more people.”
In itself press release about the deal, Cerebras said the deployment of the calculation will be add to the world’s largest deployment of high-throughput AI inference.
Cerebral said that large language models running on its AI processor provide answers as much as 15x faster than those using GPU-based systems. The company likened the impact of this speed difference to the Internet’s transition from dial-up to broadband.
Advertisement: Scroll to continue
“We are excited to partner with OpenAI, bringing the world’s leading AI models to the world’s fastest AI processor,” Cerebras co-founder and CEO. Andrew Feldman ” said in the OpenAI press release. “Just as broadband transformed the Internet, real-time inference will transform AI, enabling entirely new ways to create and interact with AI models.”
Andy Hocksenior vice president of product and strategy at Cerebras, told PYMNTS in February 2024 that AI Computing products were rare and face high demand and generative AI has seen demand for accelerating AI applications increase.
“The ChatGPT light bulb went off in everyone’s head, and it brought cutting-edge artificial intelligence and deep learning into public discourse,” Hock said.
PYMNTS reported in December 2025 that after companies spent the last two years experimenting with large language models, they are moving these systems into real-world environments. This caused a shift in investment and engineering resources towards inference infrastructure.