Only 11 months ago, a 19-second video of Will Smith devouring spaghetti broke the internet. Now, a text-to-video generator can churn out one-minute hyper-realistic videos. Meet Sora, OpenAI’s newest model, to complete its AI trifecta with ChatGPT and DALL-E.
Sora which means “sky” in Japanese was just announced last Friday, a day after Google’s Gemini 1.5 Pro was made public. In the unforeseen announcement, OpenAI shared the model’s capabilities and potential.
In case you missed it, the video above was generated with the prompt: “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.”
Eleven months since the creation of the now iconic Will Smith AI video, who would’ve thought that AI models would be capable of rendering something as spectacular as this:
But before Sora, what made AI Will Smith gobble his bowl of spaghetti? What was the technology like 11 months ago? And most importantly, after this monumental achievement, where are we going?
Let’s dive into it!
- What is ModelScope to Sora: Text-to-Video, the New Frontier
- So What? Instant Video Creation for the Masses
- Now, What’s in it for u+
ModelScope to Sora: Text-to-Video, the New Frontier
So, what technology made Will Smith devour his bowl of spaghetti?
Well, this video was made possible using an AI tool called ModelScope. It was released a year ago by DAMO Vision Intelligence Lab, the research division of Alibaba.
ModelScope is a “text2video” diffusion model trained to create new videos from prompts by analyzing millions of images and thousands of videos scraped into the LAION5B, ImageNet, and Webvid datasets. Interestingly, ModelScope also scraped images from the renowned stock image site, “Shutterstock.” This is why you can see Shutterstock’s watermarks hovering over Will Smith.
The now iconic AI-generated video was posted by Reddit user u/chaindrop on r/stablediffusion, a forum dedicated to text-to-image and text-to-video conversations. According to Chaindrop, the workflow for creating the video was fairly simple:
- Prompt ModelScope: “Will Smith eating spaghetti”
- Changed the frame per second from 24 to 48 to make it smoother.
Since its inception, this AI Will Smith video has become the poster child of text-to-video content, with Reddit users even bookmarking the original post with the RemindMeBot.
Not even a year later, text-to-video technology has become more than just Will Smith scarfing over his bowl of spaghetti. With pioneering work like Google DeepMind’s Visual Transform (ViT) and the influence of early models such as Runway and Pika, video content creation takes center stage in media generation.
Users on X are even rallying to have Sam Altman, OpenAI CEO, recreate this video using Sora. Trust us, we are waiting for this recreation as well.
In the meantime, to quench our thirst, Will Smith took it upon himself to recreate the video.
Amidts laughs and giggles, users on both Reddit and X spark conversations on how it’s becoming more and more difficult to distinguish which is real or not. This is one of the many challenges the public imposes on OpenAI.
Instant Video Creation for the Masses
So, after Sora was announced, it immediately got people talking. Some are super excited about the new discoveries it could bring, while others are hesitant about diving into this new tech.
With Sora, instant video creation will now be possible for everyday users with little to no video production experience. This means that content creators can create videos for marketing campaigns quickly and efficiently.
Here are some examples of videos generated by Sora. Sam Altman himself generated these based on the prompts submitted on X.
This technology can help social media influencers produce high-quality content without expensive equipment or outsourcing. Plus, with the rise of platforms like TikTok and Instagram Reels, short-form video content is more popular than ever. It’s quick, cheap, and scalable. However, what will happen to content creators who resort to traditional video production? Let’s talk about it at a different time.
Currently, Sora is exclusively available to a chosen set of users testing its functionalities. Throughout this testing phase, OpenAI will assess its limitations. OpenAI has been open about Sora not perfectly nailing down scenes and cause-and-effect sequences. You might see a cookie without a bite mark even after someone’s taken a bite.
If you are interested in experimenting with this technology, here are some readily available tools you can try out now:
- Runway
With this tool, you can create various projects such as generating art, music, animations, and more using machine learning algorithms. RunwayML also allows you to train your models or use pre-trained models to enhance your creative projects. - Pika
This tool fronts itself as an “idea-to-video” platform that allows you to enhance and modify videos creatively. With Pika, you can make changes to the content within the video frame or alter the style of the video easily. - Invideo
This tool helps its users generate scripts, create scenes, add voiceovers, and customize videos based on your specific text prompts.
With Sora and other readily available text-to-video models, we find ourselves at a pivotal moment in history. Not since the internet’s introduction into our homes have we witnessed such technological advancements. Yet, it’s hard not to ponder the implications of this technology’s accessibility. The potential for creating convincing deep fakes and spreading misinformation, especially in this election year.
Nonetheless, the possibilities are vast–and I’m excited for what’s to come.
What’s in it for u+
Imagine, imagine, imagine.
That’s what these text-to-video models are here for. It’s here to help us imagine–and imagine more.
Whether you’re a business owner, content creator, or just someone who wants to showcase your creativity, this new technology opens up a whole new world of possibilities.
But it doesn’t just stop at video creation. As AI continues to develop, we can expect even more advancements in text-to-video and other forms of media generation. This could change how we consume entertainment and information, making it more accessible and personalized for everyone.
So next time you see Will Smith glutting over spaghetti, remember that it’s more than a funny video generated by AI. It’s a sign of progress and proof of the potential of AI in media creation. We’re excited to see what comes next.
After all, who knows what other iconic moments will be brought to life through text-to-video technology? The sky’s the limit with generative AI.
Will Smith’s bowl of spaghetti is just the beginning. As Sora enters its testing phase, and text-to-video models like Runway and Pika are developing more and more products, the future holds endless possibilities–perhaps even sooner than 11 months.
While some ponder the future, I believe we are already living in it.
Do you want to stay current with the latest AI news?
At a.i. + u, we deliver fresh, engaging, and digestible AI updates.
Stay tuned for more exciting developments!
Let’s see what stories we can bring to life next.
See you next addition!