TL;DR: Phenaki Text to Video AI model was developed by a team of researchers from Google Research, led by Ajay Jain and Ben Poole. The development of Phenaki was motivated by the need for a sophisticated tool that can generate realistic videos from textual descriptions, addressing the challenge of producing high-quality, variable-length videos efficiently.
Google’s recent advancements in AI video generation have led to the introduction of a new model called Veo, which builds upon the technological foundations established by Phenaki and other predecessors like Imagen-Video and DVD-GAN.
Veo enhances the capabilities of video generation by understanding cinematic terms and producing high-quality 1080p videos that can exceed one minute in length. This model is currently available to select creators in a private preview, aiming to further push the boundaries of video synthesis and creative control.
Key Features of Phenaki
- Variable Length Videos: Generates videos of any length based on textual prompts.
- Advanced Tokenizer: Compresses videos into small discrete tokens for efficient processing.
- Bidirectional Masked Transformer: Generates video tokens conditioned on text tokens.
- Dynamic Prompts: Handles changing textual prompts over time for storytelling.
- Joint Training: Uses large image-text pairs and smaller video-text examples for generalization.




































