FP StaffOct 10, 2022 17:49:10 IST
Just days after Meta announced their text-to-video generator, Google has announced that it is almost ready to announce its own AI-powered text-to-video generator, which they are calling Google Imagen Video.
The generator is still in its development phase, but by the time it reaches a publicly releasable state, it will be capable of producing 1280×768 videos at 24 frames per second from a basic written prompt.
According to Google’s research paper, Imagen Video will have stylistic abilities, such as generating videos based on the work of famous artists like Vincent van Gough. It will also generate 3D rotating objects while preserving their structure and rendering text in various animation styles.
Google’s new Imagen Video Al turns text descriptions into high resolutions 5.3 second long videos🤩🤩🤩 pic.twitter.com/KhvsvGqLFh
— Tansu YEĞEN (@TansuYegen) October 8, 2022
Google says that Imagen Video has been trained on 14 million video-text pairs and 60 million image-text pairs as well as the LAION image-text dataset which was used to train Stable Diffusion.
Google hopes that its AI-video model can “significantly decrease the difficulty of high-quality content generation.” Imagen Video builds on Google’s Imagen, a text-to-image program similar to OpenAI’s DALL-E.
As described by Google’s research teach, Imagen Video will take a text description and generate a 16-frame, three-frames-per-second video at 24×48 pixel resolution. The system then upscales and “predicts” additional frames, producing a final 128-frame, 24-frames-per-second video at 720p.
— Simon Geisker (@simonfilm_nyc) October 6, 2022
It is worth noting that all the results from Imagen Video are picked by Google themselves and as of yet no independent testers have tried the program.
That said, the research paper claims that Imagen Video can render text properly, something that DALL-E and Stable Diffusion both struggle with. The text that those programs generate is barely readable.
It also claims that Imagen Video has demonstrated an understanding of depth and three-dimensionality, allowing drone flythrough videos to be created that rotate around and capture objects from different angles without distortion.
Google has voiced its concerns over “problematic data” used to train its AI-image generator programs. The company has attempted to filter out sexually explicit or violent content, as well as social stereotypes and cultural biases. It is concerned that the tool may be used “to generate, fake, hateful, explicit, or harmful content.”
“We have decided not to release the Imagen Video model or its source code until these concerns are mitigated,” adds Google.