Microsoft Build, AWS Summit and the Mind of LLMs β Live and Learn #43
Welcome to this edition of Live and Learn. This time with the Microsoft Build Keynote, updates from the AWS Summit, and an Anthropic paper on investigating the internal workings of their Claude Sonnet LLM. As always, I hope you enjoy this Edition of Live and Learn!
β¨ Quote β¨
We no longer have to settle for log cabins when we can build skyscrapers.
β Packy McCormick - (source)
Links
Mapping the Mind of LLMs by Anthropic. This is an expansion of Anthropic's work on LLM interpretability. Normally we treat machine learning models as inscrutable black boxes of matrix multiplication. However, this paper shows that it's possible to understand how the neural network represents concepts that make sense for humans on a neuron-by-neuron level. They have done work like this before, but this time they scaled it up for big, state-of-the-art models. They can map out features and neurons in the neural network that attend to different ideas and high-level concepts... For much more details you can read their Scaling Monosemanticity paper. Clamping some of these features within the LLM to a high value leads to behavior change in the models too. Claude starts obsessing over the clamped features. This leads to the model bringing up that idea in every answer even if it doesn't make that much sense. The discovered features therefore causally shape the models' behavior.
Microsoft Build Keynote. Microsoft had its Build event and they announced how they are infusing Windows with AI capabilities. They introduce a new "type" of PC, which they call Copilot+ PCs. These have a combination of CPU, GPU, and NPU (a chip designed to run ML inference) processors. Therefore these machines can run powerful machine learning models on device. The result: new, powerful Copilot features, like Rewind where you can use natural language to find files, browser tabs, or emails you have opened in the past by association. They also build these computers with new ARM-based chips which means they will be much less power-hungry than previous generations of PCs. There's some great commentary from the Stratechery blog on the Microsoft Build conference too that I recommend reading.
Stripe Session w. Jensen Huang. Patrick Collison interviewing Jensen Huang on the future of AI is an absolutely beautiful interview to listen to. Two people with serious companies and lots and lots of intelligence, arguing about what the future is going to bring and how they position their companies to profit from that, is amazing.
AWS Summit by Amazon. Amazon is not that active in the game of ML research or at least not as visible as say Google, OpenAI, or Microsoft. But don't be fooled, AWS is building the infrastructure to deploy, build, and train models at massive scalesβall within the AWS cloud ecosystem. They want to be the cloud service with the most choice of different specific LLM models, serving new models as fast as they are released by the companies creating them. AWS is also partnering deeply with Nvidia and they even build their own AI accelerator hardware for supercomputers and data centers. Finally, they also announced Q, their own copilot, which is tailored to write code that can deploy and manage infrastructure (including their new Bedrock AI services) on AWS automatically and easily.
π Traveling π
I spent the last 2 weeks in a lot of different places and have been on the move quite a bit. Right now I am at a burn in Portugal called Somewhere and before that I spent some time in Martinique, where these pictures are taken from. Life is good and beautiful and I am grateful for all the wonderful people I am meeting.
πΆ Song πΆ
Jazz is for Ordinary People by Berlioz
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
β Rico