Google's Genie, LLMs at home, and OS-Copilot β Live and Learn #37
Welcome to this edition of Live and Learn. This time with foundation models that can generate 2D video games, better humanoid robots, and a chatbot that does the work of 700 call center agents. Plus, a new article by NotBoring and a new framework for LLMs that can interact with and control the operating system of a computer.
β¨ Quote β¨
The only moments we end up fully enjoying are the ones which are so obviously Good that the evaluation hum shuts off. We get wrenched out of our assessment and forced into presence.
β Nat Eliason - (source)
Links
Genie Foundation Model by Google. This technical report blew my mind. It's a new way of training foundation models to understand "action spaces". What this means, is that the model can understand and predict the world in ways that were not there before. It can essentially generate playable 2D video games from text descriptions. But the really crazy part is that the same thing can be used to map out and explore the action spaces of robots, by video input only. Essentially you can teach a machine all the actions it can do, by showing it videos of a human teleoperating said machine. This is a huge step forward in the direction of creating robots that can do all the things humans can do.
Autonomous Robots Update by Figure 1. Humanoid robots are getting better by the day too. Figure 1 is one of many startups working on building "humanoid" bodies for AI agents. They are this close to creating a robot that can do all the things a human can do. Let's see who in the end will win the distribution wars around these robots. Because it's one thing to build a prototype it's another to create lots of them at scale and distribute them to all the companies who want to automate their factory-style jobs with it to reduce operation costs.
Klarnas Customer Chatbot. Klarna created a chatbot and deployed it to all their customers. And it's been impressive beyond expectations. 2.3 million conversations, doing the work of what formerly needed 700 call center agents. Just like that. And the weirdest thing, the bot is doing a better job at customer service than humans did in the past. This shows that the era of "stupid" bots, where you would like to punch your screen out of frustration because it keeps sending you irrelevant links to the documentation you already read and doesn't solve your problem at all, can be over. I hope that more companies implement LLM-based chatbots in the style that Klarna did.
Ambition by NotBoring. This article by NotBoring is about doing ambitious projects. Building companies that do things that have never been done before. Things that matter and are hard to do. Companies like SpaceX, or Nvidia. For me, reading articles like this is tough. I resonate with the idea of ambition so much, yet there is also this other idea about wanting to travel and live adventures and "not just work". And living out both of these ideas is something that I struggle with a lot. Still, this article tickled my mind.
Train 70B parameter LLMs at Home by Answer.AI. It's interesting to see how people optimized training processes so much that you can train very large models at home. Admittedly the setup with two RTX4090 is quite expensive (around 3000β¬ worth of graphics cards), but it's still incredible that it's possible to do something like this at all. I think in the future there will be setups, where you can train what is now state-of-the-art, effortlessly on your laptop. And I am excited for this future.
Chat With RTX by Nvidia. On the same notion of "doing things on your machine", Nvidia has released their Chat with RTX model. You can download it and it can connect to the files local to your machine. It then essentially gives you a drop-in solution to start deriving knowledge from personal notes and documents. And because it all runs on device it works offline and is 100% private. Many people and organizations have been building their own solutions for similar systems, so it's nice to see that a big player created something that just works. However, the program (with weights and all) is still quite large: around 40GB. And again you still need an expensive graphics card to run it.
OS Copilot. In this paper, the authors created a framework for LLMs that can interact with and control the operating system of a computer to do all sorts of tasks. The idea is that eventually, you can control the computer, just by talking to it in high-level terms of what you want to get done, and the computer goes ahead and magically does it. I think this sort of thing will lead to a paradigm shift in how we use our devices. It's like the first Macs with icons or the smartphones with touchscreens. Many people know what they would like to do on a computer but don't know how to go about doing it. This technology will change that, essentially equalizing access to technology.
π Traveling π
I am currently in Guadeloupe, a French overseas department in the Caribbean. It's a beautiful place with a lot of nature and a lot of hiking trails in the jungle. The shades of green and the biodiversity here are insane and I've been hiking most of the last two weeks around muddy trails in the middle of nowhere. This to me is as much part of life as the ambition to create things that have never been done before. Sometimes it feels more important than the ambition part. Traveling makes me feel alive and I cherish every minute of it.
πΆ Song πΆ
100.000 Voices by Jacob Collier
The whole album Djesse 4, which this song is from, is nothing short of crazy. I've been listening to it up and down, nonstop for the last two weeks. And⦠It's a masterpiece.
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
β Rico