Google launched Phenaki Text to Video AI Generator

Every technology is slowly moving towards AI. AI is not only limited to software, Even hardware devices like cars which transport Live people, also begin to work better. So, the work task starts from a primary method to a high one. Even though AI is great in most ways, They are still a way to reach the near-human stage. Facebook has already introduced this kind of platform in their Meta platform. Compared with Facebook, Google’s approach will be helpful for Google Docs creators and Presentations. It will be valid for Educational related content, Which helps easily understand the contents with video samples. So, Google launched Phenaki Text to Video AI Generator for various purposes.
AI Imagination Elements

Google launched Phenaki Text to Video AI Generator

In its recent paper ” Imagen Video: High-definition video creation using diffusion models,” Google claims Imagen Video can generate videos with high fidelity and has an extremely high level of control and global knowledge. However, its debut comes only five months after Imagen demonstrated the rapid growth of synthesized models. The capabilities of the generative model are creating diverse videos and text animations that come in various artistic styles, an understanding of 3D text rendering, and animation. The model is in the study phase.

How do they work?

Imagen Video consists of a text encoder (frozen T5-XXL), a base diffusion model, and interleaved spatial and temporal super-resolution models. To build this structure, Google claims it transferred results from previous research on image generation based on diffusion to video generation settings. The research team also introduced advanced distillation in the models, with the guidance of a classifier to enable quick, high-quality, quality sampling.

Also Read-  Apple iPhone 14 and AirPods Pro 2 to be launched on September 7, 2022

Video Generation Framework comprises an array of seven different models of video diffusion, which perform text-conditional generation of video spatial super-resolution, spatial super-resolution, and temporal super-resolution. Through the entire process, Imagen Video generates high-definition 1280×768 video with 24-frames per second, which is 128 frames, roughly 12 million pixels. With the aid of progressive distillation, Imagen Video can generate high-quality videos by using only eight diffusion steps for each sub-model. This can speed up the video creation time by approximately 18 times.


Google states the video test on the LAION-400M public image-text dataset, in addition to 14 million video-text pairs and 60 million text-image pairs. The training data sets allowed it to expand its range of aesthetics. Another benefit of the cascading models found by Google’s team of developers was that each model could be independently trained. It allows one to learn the seven models simultaneously.


Google claims the Imagen Video was trained on 14 million video-text pairs and 60 million text images. It pairs, and the publicly accessible LAION-400M text-image dataset allowed it to expand to a wide range of aesthetics. (Not-so-coincidentally, a portion of LAION was used to train Stable Diffusion.) Through experiments, they discovered the fact that Imagen Video could create videos that were based on Van Gogh’s paintings as well as watercolor designs. Perhaps more impressively, they claim that Imagen Video demonstrated an understanding of depth and three-dimensionality, allowing it to create videos like drone flythroughs that rotate around and capture objects from different angles without distorting them.

Samsung Galaxy S23 Renders
Samsung Galaxy S23 Renders

Phenaki Team About AI

The Phenaki team states that Phenaki can use the massive text-image database to create videos. The user can narrate the video and change scenes dynamically.

A Generative AI trend began with text-to-image and has now shifted to text-to-video. It appears to be evolving toward text-to-3D, using models like CLIP-Forge. It is a text-to-shape-generation model that can generate 3D objects using zero-shot learning. Google’s text-to-3D AI, ” DreamFusion,” launched last week. It is a prime example of generative AI advancing toward a more aggressive 3D Synthesis method. DreamFusion makes use of Imagen to improve the 3D scene.

Also Read-  Google TV Chromecast Lite Low-End version will be launched on October 6, 2022, for EUR 40

Wrap Up

The purpose of AI is always to reduce the workload of any person. Compared with Text related works, Video Render works always take time. With the help of the new AI, We can render short videos that are useful for our content and enhance the attention to the topic. As I mentioned earlier, It will be more beneficial for Education related work. What are your thoughts about Phenaki Text to Video AI Generator? Share your thoughts below.

Source, (2), (3), (4)

Share This Post:

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.