OpenAI will integrate 750 megawatts of ultra-low latency compute from chipmaker Cerebras to accelerate the response time of its artificial intelligence (AI) models.
The capacity from the deal will come online in several stages, beginning this year and continuing through 2028, according to press releases issued by the companies on Wednesday (Jan. 14).
For users of OpenAI’s AI models, the addition of this compute will enable the delivery of real-time responses when the AI is answering hard questions, generating code, creating images or running AI agents, OpenAI said in its press release.
“OpenAI’s compute strategy is to build a resilient portfolio that matches the right systems to the right workloads,” Sachin Katti, compute infrastructure at OpenAI, said in the release. “Cerebras adds a dedicated low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation to scale real-time AI to many more people.”
In its own press release about the deal, Cerebras said the rollout of the compute will add up to the world’s largest deployment of high-speed AI inference.
Cerebras said that large language models running on its AI processor deliver responses as much as 15 times faster than those using GPU-based systems. The firm compared the impact of this difference in speed to the internet’s transition from dial-up to broadband.
“We are delighted to partner with OpenAI, bringing the world’s leading AI models to the world’s fastest AI processor,” Cerebras Co-Founder and CEO Andrew Feldman said in OpenAI’s press release. “Just as broadband transformed the internet, real-time inference will transform AI, enabling entirely new ways to build and interact with AI models.”
Andy Hock, senior vice president of product and strategy at Cerebras, told PYMNTS in February 2024 that AI compute products were in short supply and facing high demand and that generative AI had seen demand for AI application acceleration rise.
“The ChatGPT light bulb went off in everybody’s head, and it brought artificial intelligence and state-of-the-art deep learning into the public discourse,” Hock said.
PYMNTS reported in December 2025 that after companies spent the past two years experimenting with large language models, they are moving those systems into live environments. This has caused a shift in investment and engineering resources toward inference infrastructure.
The post OpenAI Says New Compute From Cerebras Will Accelerate AI Models’ Response Time appeared first on PYMNTS.com.