UAE lab breaks the speed barrier in AI video generation

FastVideo creates in five seconds what Sora takes two minutes to create, say researchers

Mar 22, 2026

Demo video screenshot (Image credit: Hao AI Lab @ UCSD)

#UAE #video – The Institute of Foundation Models (IFM) of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), in collaboration with UC San Diego, has unveiled FastVideo, a real-time AI video generation system capable of producing 30 seconds of 1080p video in just five seconds: faster than the clip itself plays back. The breakthrough combines IFM’s FastVideo inference framework with its K2 Think (K2-V2) reasoning model, which guides video generation intelligently in real time. The team has also launched Dreamverse, a prototype creative interface built on FastVideo that enables what researchers call ‘vibe directing’, steering video content through rapid natural language iteration rather than single, exhaustive prompts.

SO WHAT? – Current leading AI video generation tools, including OpenAI’s Sora, take one to two minutes to produce a five-second 1080p clip. MBZUAI’s FastVideo now does this in under five seconds on a single GPU. That is not just a speed improvement, the platform has the potential to change the creative workflow entirely. Firstly, when revisions come back almost instantly, creators can test many ideas rather than commit to one. Secondly, this also has significant implications for world model research, where real-time generative capability has long been considered too computationally expensive to be practical. Lastly, this may have enormous future implications for video on demand applications, video gaming and real-time video media streaming.

Here are some key facts about the fast video breakthrough:

MBZUAI’s Institute of Foundation Models (IFM), working with UC San Diego, has produced FastVideo, a system that can generate 30 seconds of 1080p video in approximately five seconds, generating content faster than it can be played back, a first for AI video generation at this resolution.
In comparison, OpenAI’s Sora video generator currently takes one to two minutes to generate a five-second 1080p clip. According to MBZUAI researchers, FastVideo achieves the same output in around 4.55 seconds on a single GPU, representing a speed improvement of roughly 20 to 25 times over existing leading systems.
At the core of FastVideo is a trainable sparse attention mechanism that dramatically reduces the computational cost of video diffusion: the underlying process by which AI video generation models produce frames. This addresses a longstanding assumption that high-quality generative video was too expensive to run in real time.
FastVideo is paired with K2 Think, MBZUAI’s reasoning language model, which acts as an intelligent director during generation, providing real-time reasoning and control rather than simply executing a static prompt. The combination of fast generation and live reasoning is described by the team as a new class of capability.
Dreamverse, the creative interface built on FastVideo, introduces vibe directing: a workflow in which users steer video content through rapid, iterative natural language instructions rather than writing a single complex prompt. Users can change camera angles, continue scenes, adjust motion or swap backgrounds across a chain of five-second clips in real time.
FastVideo is an open framework designed to be modular and extensible, supporting sparse distillation, full and LoRA fine-tuning, and scalable training across up to 64 GPUs with near-linear performance scaling. NVIDIA’s Dynamo inference platform has already added FastVideo as a supported backend.
The IFM team describes the breakthrough as directly relevant to world model research: AI systems that model and interact with physical reality rather than simply predicting text or images. Real-time generative capability removes a key practical barrier to making generalised world models computationally viable.
Dreamverse is powered by K2-V2, a fully open foundation model combining general-purpose language capabilities with long-context understanding and tool-augmented workflows, developed at MBZUAI with support from NVIDIA.

ZOOM OUT – Most AI systems today predict the next word or the next pixel. The PAN World Model (PAN stands for Physical, Angelic and Nested) is built to do something fundamentally different: predict the next state of the world. Rather than generating content, PAN simulates reality. It integrates language, video, spatial data and physical actions into a unified internal model of how the world works. This enables AI systems to reason about cause and effect, test decisions in simulation before acting, and generate rare or high-stakes scenarios that would be impossible or dangerous to recreate in the real world. FastVideo provides a glimpse of what next-generation world models could offer: the possibility of simulating reality by generating video in near real-time.

[Written and edited with the assistance of AI]

Middle East AI News

Discussion about this post

Ready for more?

Middle East AI News

UAE lab breaks the speed barrier in AI video generation

FastVideo creates in five seconds what Sora takes two minutes to create, say researchers

LINKS

Discussion about this post

Ready for more?