Open AI's Sora, AI Sound Effects, and Gemini 1.5 – Live and Learn #36
Welcome to this edition of Live and Learn. This time with OpenAI and Elevenlabs changing the world of movie creation, how Google is opening access to some of their LLM models, and their Gemini 1.5 announcement. As always, I hope you enjoy this edition of Live and Learn.
✨ Quote ✨
The programming language of the future is called human.
– Jensen Huang - (source)
Links
Sora by OpenAI. Finally, OpenAI has released its take on text-to-video generation. And it seems to be the best out there: a new milestone, much better than anything before, like RunwayML or Pika or Stable Video. The main thing that distinguishes OpenAI's efforts is that their model can generate videos up to a minute long while keeping the context "alive". OpenAI's model is likely also using an entirely different, novel architecture, something that is much closer to a world model or simulation engine. If text-to-video models develop with the same speed that generative images did, I think that by the end of this year, we will have photorealistic half-hour-long videos that can be entirely generated on home laptop devices within mere seconds... I am excited but also terrified of this future. Entirely AI-generated movies will become a thing and humans "just" have to do the high-level direction, which will cut production times down to a mere fraction of what they are now. I imagine a world where–if you have an idea for creating a movie–you don't go out and cast actors and think about what camera to shoot on etc. but instead use an AI to create the entire thing.
AI Sound Effects are coming soon by ElevenLabs. The Sora video generation model can be paired with technology from ElevenLabs to generate the music and sound effects on top of all of this. What this looks and feels like right now is shown in their demo. This is only a small glimpse of what's to come. Maybe by the end of this year, AI will not just handle the visual aspects of Hollywood… To me, the most crazy thing about all of this is how empowering this technology feels like. It means that everybody can generate amazing videos without needing a big budget for a RED camera, advanced CGI, VFX or anything like that. Simply let your creativity run wild and create the things you envision with the help of AI.
Stable Diffusion 3 by StabilityAI. StabilityAI has put out the announcement for their newest Stable Diffusion model. They are still in the beta testing phase right now but it will be rolled out and openly available soon. The most notable thing about it is that the model is much better at adding text into generated images accurately. Stable Diffusion also recently released Stable Cascade, another image generation model that can run on less demanding hardware, that is also built on an entirely different architecture.
Gemma Open Source Models by Deepmind. Google makes part of its LLM efforts more accessible for others to use. They provided multiple open-source models they call "Gemma" which are based on the architecture of their big Gemini model. It makes me happy to see a big company like Google hop more onto the open-source train, especially within the world of machine learning.
Gemini 1.5 by Google. Google is opening access to its Gemini Ultra models. To me the craziest aspect is the length of their context windows: In their own words:
"This new generation also delivers a breakthrough in long-context understanding. We've been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet."
1 million tokens?! This means you can feed it an entire series of books like Harry Potter and it will understand the whole thing. You can ask it for summaries and quotes and advice and emulating persons from the book etc. This means soon you'll have something akin to a "write a sequel to this novel" button.
🌌 Traveling 🌌
I spent the last two weeks on a boat, sailing across the Atlantic. It was one of the most fun but also challenging experiences of my life and I have more pictures of sunrises and sunsets than I could ever use for anything. Seeing land again after 18 days out at sea felt incredible and the water here in the Carribean is simply beautiful.
🎶 Song 🎶
Little Blue (Mahogany Session) by Jacob Collier
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
– Rico