Nvidia Keynote, MusicLM, and Apple's Vision Pro – Live and Learn #19
Welcome to this edition of Live and Learn. This time with some mindblowing product demos of Nvidia and Apple, more multi-modal models, and an AI that can learn how to play Minecraft on its own.
✨ Quote ✨
My definition of magic: competence so much more advanced than yours, with such alien mental models that you cannot predict the outcomes of the model at all.
– Autotranslucence - (source)
Links
Voyager by MineDojo. Voyager is an AI agent that has learned how to play Minecraft on its own. It's powered by a large language model and the implications of this paper are huge. This agent is open-ended. That means it explores on its own and learns the strategies of how to play the game all by itself. This looks like a toy right now but might end up being a critical component to building true AGI. Also, this kind of open-ended agency is what might make AGI so dangerous. An open-ended agent, "exploring" its environment, might accidentally find strategies and tasks that harm humans on a massive scale.
COMPUTEX Keynote by Nvidia. Buy more, save more... Jensen Huang, the CEO of Nvidia demonstrates that Nvidia will dominate the market of AI computing for some time to come. Nvidia is not just busy selling graphics cards for gamers, they are re-inventing the future of computing and building out the necessary infrastructure for the continued development and deployment of ever more powerful AI, built right into machines and data centers. The vision that they sell is one of ubiquitous machine learning, powered by Nvidia hardware, that is embedded into everything. This will translate the effects of more advanced AI algorithms into the real world, from manufacturing to telephone calls. The density of mindblowing announcements over this two-hour keynote is just wow.
VisionPro Announcement by Apple. Apple released a product trailer for their new VR headset, the VisionPro. They are going to sell it for an insane 3499$ and yet people are going to buy it. The product announcement video is worth watching because it feels like we live inside a Black Mirror episode. Apple executed the idea of AR/VR in typical Apple fashion: superlatively. There is a great analysis by Professor Galloway about Apple's move and why the product is going to fail nonetheless, which I highly recommend reading as well.
MusicLM by Google Research. Text to Music is a real thing now. Just listen to some of these demos! Eventually, music-generating AI like this will be good enough so that we can auto-generate entire soundtracks for YouTube videos or games; or even entire background music. We could even generate it on the fly, automatically adapting the music to the mood and the narrative seamlessly.
Codi-Gen by Microsoft Azure Research. This model can "translate" between different modalities. Given a description and an image, it can generate sound. Given sound it can generate images; given text it can generate sound, given images and sounds it can generate videos etc. It's crazy how multimodal models start becoming more popular and powerful and I wonder where this will be in a few years.
Bark by Suno AI. An open-source speech synthesis model, with the code available on GitHub. The demo on "code-switching" at the bottom of their notes completely blew my mind. Bark can generate speech from text, even if the text switches between two languages. The audio is seamless, while keeping all the characteristics of the "speaker", like accent and voice the same, even though the language and the language "melody" change.
🌌 Midjourney 🌌
🎶 Song 🎶
My Favorite Things by John Coltrane
I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? Please let me know.
Cheers,
– Rico