Odyssey, a startup founded by Oliver Cameron and Jeff Hawke, has developed an AI model allowing users to interact with streaming video. Its early web demo generates and streams video frames every 40 milliseconds, letting viewers explore areas within these videos.
According to the startup’s blog post, the model predicts the next state of the world based on the current state, incoming action, and a history of states and actions. Odyssey states this is powered by a new world model capable of generating pixels that feel realistic, maintaining spatial consistency, learning actions from video, and outputting coherent video streams for 5 minutes or more.
Several startups and Big Tech companies are also pursuing world models, including DeepMind, Fei-Fei Li’s World Labs, Microsoft, and Decart. These entities envision that such models could be used to create interactive media and run realistic simulations.
Video: Odyssey
Creative professionals have expressed mixed feelings about this technology. A Wired investigation found that game studios, such as Activision Blizzard, are using AI to cut costs. Furthermore, a 2024 study commissioned by the Animation Guild estimated that AI could disrupt over 100,000 U.S.-based film, television, and animation jobs.
In response, Odyssey states it is pledging to collaborate with creative professionals rather than replace them. “Interactive video opens the door to entirely new forms of entertainment, where stories can be generated and explored on demand, free from the constraints and costs of traditional production,” the company writes in its blog post. It believes that video across entertainment, ads, education, training, and travel could evolve into interactive video.
The company acknowledges that its demo is still under development, noting that the environments its model generates can be blurry, distorted, and unstable. The layouts do not always remain consistent as the viewer moves through the environment.
You can now speak to Claude AI as its voice mode rolls out
Odyssey is promising to improve the model, which currently streams video at up to 30 frames per second from clusters of Nvidia H100 GPUs at a cost of $1 to $2 per “user-hour.”
The startup says it is researching richer world representations that capture dynamics more faithfully while increasing temporal stability and persistent state. It is also expanding the action space from motion to world interaction, learning open actions from large-scale video. To capture real-world landscapes, Odyssey designed a 360-degree, backpack-mounted camera system. The team believes this method can serve as a basis for higher-quality models compared to those trained solely on publicly available data.
To date, the company has raised $27 million from investors, including EQT Ventures, GV, and Air Street Capital. Ed Catmull, a co-founder of Pixar and former president of Walt Disney Animation Studios, is on the startup’s board of directors.
Last December, Odyssey announced it was developing software to allow creators to load scenes generated by its models into tools such as Unreal Engine, Blender, and Adobe After Effects for manual editing.