TL;DR: Phenaki Text to Video AI model was developed by a team of researchers from Google Research, led by Ajay Jain and Ben Poole. The development of Phenaki was motivated by the need for a sophisticated tool that can generate realistic videos from textual descriptions, addressing the challenge of producing high-quality, variable-length videos efficiently.

Google’s recent advancements in AI video generation have led to the introduction of a new model called Veo, which builds upon the technological foundations established by Phenaki and other predecessors like Imagen-Video and DVD-GAN.

Veo enhances the capabilities of video generation by understanding cinematic terms and producing high-quality 1080p videos that can exceed one minute in length. This model is currently available to select creators in a private preview, aiming to further push the boundaries of video synthesis and creative control​.

Key Features of Phenaki

  • Variable Length Videos: Generates videos of any length based on textual prompts.
  • Advanced Tokenizer: Compresses videos into small discrete tokens for efficient processing.
  • Bidirectional Masked Transformer: Generates video tokens conditioned on text tokens.
  • Dynamic Prompts: Handles changing textual prompts over time for storytelling.
  • Joint Training: Uses large image-text pairs and smaller video-text examples for generalization.