FunSearch, DeWave, and Music FX β Live and Learn #32
Welcome to this edition of Live and Learn. This time with a new approach to add mathematical reasoning to LLMs, a way of reading people's brain waves with only EEG data, and announcements of multiple models to generate audio and 3D content. As always I hope you enjoy.
β¨ Quote β¨
A poet once said, "The whole universe is in a glass of wine." We will probably never know in what sense he meant that, for poets do not write to be understood. But it is true that if we look at a glass of wine closely enough we see the entire universe.
β Richard Feynman - (source)
The Feynman lectures are a wonderful series of books, completely available online for free. I highly recommend reading them. Feynman has a way of explaining the world that is just enjoyable. To get an introduction you can watch his Fun to Imagine documentary on YouTube. There is so much wonder in the way that he looks at even the simplest thingsβit is infectious.
Links
FunSearch by Deepmind. Google's Deepmind has created an approach with LLMs that lets them reason about mathematics from first principles. The LLMs discover code snippets that solve mathematical problems and improve their solutions in a self-learning/self-play kind of manner. Even though the results don't look "too impressive" at first glance, this is a huge step forward towards AGI. Because Google found a way to give LLMs reasoning capabilities: the ability to use language to "think" through novel problems they have never seen before. And this is just nuts.
General World Models by Runway. Runway thinks that when they focus on training video generation models, they can get better results if the model has an underlying understanding of the world it is trying to generate videos for. That's why they started a new research team focused on building models that understand the world much more like we do. The video in their announcement is delightful to watch too.
Optimus Update by Tesla. Tesla keeps increasing the capabilities of their humanoid robot and every time they share an update it's crazy to see how fast their pace is. Now it can grasp delicate objects and the robot's hand movements look eerily human. Still, there's a long way to go: the walking looks weird, unstable, and slow, especially if compared to the crazy running and backflipping robots of Boston Dynamics. But still, I can see how we move closer and closer to a crazy sci-fi future with every passing month the team at Tesla is working on this.
AudioBox by Meta. AudioBox is a tool developed by Meta AI that can generate sound environments for all kinds of things. Describe what kind of soundscape you want to produce, and then stitch it together in the AudioBox editor, with voices, sound effects, and more. You can play around with it for free and while it still has a "toy" feeling to it now, it will eventually become a very powerful tool for people who need soundscapes for a specific environment, like game developers or video editors.
Music FX by Google. Music FX is a tool developed by Google that can generate music based on text input. You can choose the genre, the mood, and the instruments via prompt and it generates short musical clips for you. It's not perfect yet but it's an awesome tool to play around with and I think it's a great example of how AI can be used to aid in creative pursuits. Again, game designers and video editors can profit a lot from this, as they can now add unique music to their projects, without having to know how to compose or the need to hire a composer.
3D Generation by Stability AI. The next iteration of 3D generation models by Stability AI has much better output quality now. To me, it's crazy how every month there is more and more progress in the field of AI-generated 3D content. And I have said so before, but I think that by the end of 2024, we will have models that are as good at generating 3D models as Midjourney is at generating images now.
DeWave β BrainWave Reading from EEG by HAI Centre. There have been brain wave reading papers in the past, but most of them only used to work on FMRI data. This paper is different in that it works with an EEG cap instead. What they have at this point is, essentially, a brain-reading hat. And it will only become better from here on out. Still, the accuracy is nowhere near 100% and many of the words and sentences decoded are garbled, but the key concepts are reproduced accurately. For people who can't speak anymore, this kind of tech will be life-altering, especially when paired with an AI voice that is generated based on recordings of their own voice before they lost it. They have a Github repo here and a video demo as well that is worth watching.
LLMs beyond Attention by Interconnects. This post dives into a new architecture called state-space LLMs that might replace the attention-based transformer mainly used today (ChatGPT and Gemini are examples of transformers). It is quite technical and also quite long, but if you are interested in this sort of thing it is worth the read.
π Traveling π
I am still traveling around and wanted to share some actual pictures of Spain and Morocco this time, instead of the AI-generated ones.
πΆ Song πΆ
I'm confident that I am insecure (acoustic-ish) by Lawrence
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
β Rico