GNoME, Quantum Computing, and Google's Gemini – Live and Learn #31
Welcome to this edition of Live and Learn. Again, the last two weeks have been exploding in terms of announcements and awesome things that I have found. This edition is a wild mix: Robotic arms mimicking those of octopods, an essay about what it means to be human in the age of AI, crazy progress towards automated AI scientists, the interviews of the New York Book Deal Summit, the Gemini announcement and more...
✨ Quote ✨
The hardest stone, in the light of what we have learned from chemistry, from physics, from mineralogy, from geology, from psychology, is in reality a complex vibration of quantum fields, a momentary interaction of forces, a process that for a brief moment manages to keep its shape, to hold itself in equilibrium before disintegrating again into dust, a brief chapter in the history of interactions between the elements of the planet, a trace of Neolithic humanity, a weapon used by a gang of kids, an example in a book about time, a metaphor for an ontology, a part of a segmentation of the world that depends more on how our bodies are structured to perceive than on the object of perception – and, gradually, an intricate knot in that cosmic game of mirrors that constitutes reality.
– Carlo Rovelli - source: The Order of Time
Links
Millions of new materials discovered with deep learning by DeepMind. This article and the accompanying research paper dive into how AI can help to discover new materials with useful properties. The GNoME project predicts material compounds that are stable using a Graph Network, evaluating the results by using Density Functional Theory. If paired with automated AI labs that can synthesize and test these novel crystal structures we move one step closer to the idea of an AI scientist. And all of the compounds found out this way are documented and accessible via the Materials Project website. Progress in areas like this is awesome to see, because it moves the positive benefits of AI from the world of bits, into the world of atoms where it truly matters.
The Inside Story of Microsoft's Partnership with OpenAI by Charles Duhigg. A beautiful article about how OpenAI and Microsofts partnership has played out, in light of the drama around the firing of Sam Altman and what all of this means for the safe development of AI. The article is quite long but worth the read.
Animate Anyone. Stable Diffusion combined with OpenPose/Control Net-based Animations and an initial image = Animate Anyone. This paper lays out an approach for using Stable Diffusion Models to generate controlled video animation sequences of characters, with a reference to a photo. The results are looking incredible. Most of the artifacts from previous attempts of these techniques are gone and the videos look almost photorealistic (though the method also works on anime characters). The Arxiv paper can be found here.
Animatable Gaussians by Meta. Animatable Gaussians are a way to construct lifelike 3D animations of human characters by scanning humans from RGB video only. This is different from the Animate Anyone paper in that it is 3D and doesn't aim to generate video "from scratch". It rather combines lifelike animated motion with a virtual scanned avatar of a real person. Things like this will eventually become big for video games and things like the metaverse. The paper again, can be found here.
Robotic Octopus Tentacle by Nature. Building soft robots that can move in weird ways like the tentacles of an octopus is an active research area and problem because the flexibility of such movements is hard to replicate. But their demos look impressive (even if sped up by a factor of 2 in the article's video). Still, a long way to go, but it gives me Matrix sentinel vibes already.
Seamless Communication by Meta. I have written about the first version of Seamless here before, and Meta has released another version that improves on the first in 3 critical ways. They brought latency down to only 2 seconds, made the translations better in capturing what has been said, and lastly the translated content now adheres to the "style" of the speaker, incorporating pauses, stress, volume, and other "non-word" ideas into the translation. The demos and videos on their Seamless page are pretty impressive. And the models are open-source and accessible on Github too. They believe that what they are working on can become the "Babelfish" from Hitchhikers Guide to the Galaxy. To quote from their announcement: "The breakthroughs we've achieved with Seamless show that the dream of a universal, real-time translator isn't science fiction—it's becoming a technical reality." And that to me is so awesome.
Ego Exo 4D by Meta. This dataset is another thing Meta announced recently. The idea is simple: We need an annotated dataset of human skills to efficiently train AI to emulate these behaviors. To do this they created a dataset of humans filmed from different angles while performing their craft, while also recording their perspective POV style. Hence the name Ego (for the first-person view) and Exo (for the angles from the outside world). I think this dataset can serve as a cornerstone in advancing robotics, but at the same time, I find the idea that people in their "favorite" jobs might be replaced by AI someday very scary. Thinking of chefs in a restaurant who really enjoy their work vs. robots preparing the same foods instead, gives me the creeps somehow. The same is true for robots playing basketball or going climbing to name two other examples from their activities.
AI and Trust by Bruce Schneier. Bruce Schneier is somewhat of a guru among cybersecurity people, with websites dedicating Chuck Norris style jokes to him. This article is a fresh take on the whole debate about AI. In it, he talks about how trust is fundamental to society and baked into the rules and regulations, and how we make a categorical error when trusting corporations the same way we trust people. Corporations are only as trustworthy as the systems they have to follow. The trust in systems is fundamentally different from the trust in people. And with the rise of human-like AI, corporations will try to abuse this error in human thinking because we too readily trust AI systems like people because of their use of natural language. Corporations will want us to think of AI systems as our "friends" or even "partners" so that we trust them with everything. Because once corporations know everything about us they can use this knowledge to manipulate us completely. To quote Bruce Schneier: "The friend/service confusion will help mask this power differential. We will forget how powerful the corporation behind the AI is because we will be fixated on the person we think the AI is." The article provides lots of stuff for thought.
Is my Toddler a Stochastic Parrot? by Angie Wang. This digital sketchbook is a heartrendingly beautiful way of thinking about our humanity in the age of AI. Asking the question: Are we nothing but Stochastic Parrots like GPT–it goes on an exploration of what it means to be human and why AIs, with all their promise of "solving the world's problems" still can't replace the beauty of human connection. It is written from the perspective of a mother watching her son grow up, and comparing the value of her barely talking son to that of an incredibly clever AI like ChatGPT. This touched my heart and reminded me of another webcomic by Kingshukdas that I put on this newsletter a while back.
1000 Qbit Chip by IBM. IBM is pivoting from making their quantum computing chips bigger with more Qbits to making them more error-resistant instead. A similar idea to what the DARPA research is pursuing. I find things like this fascinating since quantum computing is one of those technologies people have been discussing for years, yet it always seems far away, much like fusion energy. But people continue to make progress on these fronts, stretching what is possible every year, until eventually quantum computers might become as ubiquitous in our daily lives as "normal" computers today. I also have been reading this digital book on quantum computing and can only recommend checking it out if you are into that sort of ultra-nerdy stuff.
Elon Musk Interview at the Book Deal Summit by The New York Times. Think what you may of Elon Musk, his tweets, and his actions at Twitter, this interview is worth listening to. The journalist, while being a friend of Elon, is asking him really hard questions, trying to make him see a different point of view, all the while having a professional and interesting conversation. It's fascinating to watch. The other interview I listened to from the summit is that of Jensen Huang, the founder of Nvidia, which was also quite interesting.
Gemini Announcement by Google. Google's Gemini Announcement was supposed to show the world that Google is still in the game when it comes to the development of AI and can "take on" OpenAIs ChatGPT and Microsoft. While impressive, their demo was a bit of a fiasco, mired in controversy around how their model was presented and how they effectively tried to lie about its actual capabilities–making Gemini look like something it isn't. Their announcement numbers were also doctored to make Geminis Output look better than it really is compared to GPT-4. Yannic Kilchner has an awesome video about the whole thing. I think it's still remarkable what they produced but at the end of the day this kind of "marketing" makes me somewhat sad about Google's approach. Instead of truly beating OpenAI by building better products, they are trying to prop up their model on questionable numbers, trying to look good without actually having significant improvements to share.
🌌 Traveling as Imagined by Midjourney 🌌
Thought this was fun. I am currently in Barcelona and wanted to let Midjourney imagine the things I have experienced on my trip. Here are some the results:
🎶 Song 🎶
Dead Inside Shuffle by Louis Cole (watch the music video ^^)
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
– Rico