Nvidia Edify 3D, Flux Tools and Fugatto – Live and Learn #56
Welcome to this edition of Live and Learn. This time with booknotes for the book Innate, better AI image editing tools by Black Forest Labs, and an insanely good AI 3D generation model by Nvidia, as well as an article on the limitations of reasoning models like o1. As always I hope you enjoy this Edition of Live and Learn.
✨ Quote ✨
A world divided into writes and write-nots is more dangerous than it sounds. It will be a world of thinks and think-nots. I know which half I want to be in, and I bet you do too.
– Paul Graham - (source)
📖 Booknotes 📖
Innate: How the Wiring of Our Brains Shapes Who We Are by Kevin J. Mitchell. Excellent read on the nature of our brains and what Inheritance means. This book dives into how much and yet how little we know about the workings of our brains. It is trying to shine a light on the debate of nature vs. nurture, and how much of who we are is innate but also what isn't. We know that many traits (like intelligence or personality) have genetic components and are therefore somewhat inherited. However, we often still don't understand how and which genes lead to certain outcomes we often still don't understand. This book tells the story of the things that we do know and how we came to know them.
Links
Fugatto - Foundational Generative Audio Transformer Opus by Nvidia. This is a model that can generate music and all kinds of sounds. What makes it special, however, is that it can generalize quite well to novel sounds and create emergent, never-before-heard effects. The authors designed it as a tool to aid musical creation–in their own words: "We envision Fugatto as a tool for creatives, empowering them to quickly bring their sonic fantasies and unheard sounds to life" You can watch a demo showcasing this Framework's capabilities on YouTube. Jacob Collier would probably like this tool ^^
US government commission pushes Manhattan Project-style AI initiative by Reuters. This article caught my attention because it tells the story of how the AGI race is intensifying. It seems like the US government (and therefore the military) is AGI-pilled. The last thing the world needs is an AGI arms race, but it seems like this is exactly where we are heading. However, I also found this commentary on LessWrong that heavily criticizes the article in question. It seems like there is more nuance to the topic because at least officially, there is no sign of China racing towards AGI (yet). I like this quote from the LessWrong article: "Only one superpower has a government commission publicly calling for a militarized race to build superintelligent AI (with no plan for how to control it), and it’s not China." Either way, things are bound to get "interesting" now that governments are rushing into AGI development programs as a means of gaining a decisive military advantage.
Flux 1 Tools - AI Image Editing on Steroids by BlackForestLabs. So far, this has been the best image editing model that I have seen. Seriously. Just look at the demos on their website! There are no artifacts, the model can handle replacing texts while keeping the style intact, it can outpaint and inpaint with masks, as well as change textures. It's like automatic Photoshop powered by text and a paintbrush. And the crazy thing is that they released a slightly less powerful DEV version of this as an open weights model, freely available on Hugging Face even providing example inference code via Github too. You can try out the more powerful but proprietary PRO version as well, via an API at one of their partners. Replicate for example costs only a few cents per edit/generation and is quite fast and easy to use.
The Reasoners Problem by Aidan McLaughlin. This article argues that reasoning models like o1 have a fundamental problem, namely that RL-based learning isn't good for generating behavior where reward is sparse. And this matters because most problems we care about sadly belong to this category. So even if we get super advanced AI, if it is based on o1 RL style learning, it won't be as beautiful or helpful as we would hope. The author argues that it wouldn't be able to vastly outperform humans in many of the things we care about like soothing a hurt friend, writing brilliant poetry or giving advice for corporate strategy, or even meaningful feedback on an essay. This would be sad. In the words of the author: "If the entire AI industry moves toward reasoners, our future might be more boring than I thought."
Edify 3D Model by Nvidia. Nvidia in partnership with Shutterstock has created a custom fine-tuned model that can generate amazingly high-quality 3D assets from text or image input. It's really damn good. They put out a live demo (capped at 50 results) and a model card too. There's also a video with a showcase of the outputs of the model, that's worth watching. To me, this is incredible because it seems like 3D generation with AI is basically a solved problem now.
🌌 Travel 🌌
The last two weeks have been filled with traveling. I am now sitting in a beautiful flat in Bogotá, couchsurfing with some lovely people and enjoying my time tremendously. Before that, I spent some days in Madrid and Barcelona in Spain, which is where all the pictures below are from. I am still planning my cycling trip across South America but things are coming together. The next week will be spent organizing all the things necessary for this adventure. Stay tuned for more of this because I am trying to write a little more about it in the traveling section of this website, soon. 🤗
🎶 Song 🎶
Só Danço Samba by Joan Chamorro and Rita Payés
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
– Rico