Nvidia TTT, Google Cloud Next and AI Scientist-v2.0 – Live and Learn #66

Welcome to this edition of Live and Learn. This time with a video essay on things AI will never understand, OpenAI announcing o4-mini, and a paper on increasing temporal consistency for AI video generation. As always, I hope you enjoy this Edition of Live and Learn!
✨ Quote ✨
I'm extremely hopeful for the future. I think we may be able to cure basically every known disease. We may dramatically extend the human lifespan. This future could be very positive for humanity. I'm also extremely worried. I think the short-term impact on hundreds of millions of people is going to be very profound, and I don't think many people are prepared.
– Tom Blomfield - (source)
Links
Things AI Will Never Understand by Fractal Philosophy. I really liked this video because it goes deep into the philosophical constraints around AI and the problems that AI will always face. The video starts out "harmless" with examples of bilingual jokes and context-dependent translations between Japanese and English, but then quickly veers off into more abstract theories of simulations and simulacra, while adding book after book to a solid reading list. It ends up questioning the nature of the information we (and AIs) consume. Each layer added: original book into translation into YouTube video into Twitter Summary into AI training data, removes us further from the real idea, altering and manipulating information in the process. Humans can peel back these layers and reverse engineer where things came from and place them in their appropriate context. AI's, at least for now, can't. They confuse Simulacra and Simulations, leading to all kinds of downstream problems.
Open AI o3 and o4 mini. OpenAI announced their newest reasoning models and showed off some benchmarks and demos around them. Their naming is as unfortunate as ever, because they also released 4.1 and 4o, which is not the same as o4... Aaand 4.1 will replace 4.5 because it is cheaper and sort of better? To me, it looks a bit like they lost it, trying to put out new things to keep up the hype and growth, without having any real rhyme or reason to it. The demos from their videos are pretty nice, though. o4-mini seems to be a better reasoning model than o3-mini overall and has vision capabilities too, but I wonder which version they have internally by now... o5, o6?
AI Scientist First Peer Reviewed Paper by Sakana AI. I covered the Sakana AI Scientist in a newsletter a while back, but now they have dramatically improved upon their system. One of the papers that the Sakana AI Scientist-v2 generated even passed the Peer Review process for the ICLR 2025 AI conference workshop. This is a strong proof of concept that AI can carry out novel AI research, entirely on its own: proposing new ideas, experimenting, and summarizing the results in papers that humans can understand and follow. To me, this is absolutely nuts, and I wonder what comes next. But the predictions of the AI-2027 essay don't seem too far off anymore when looking at developments like this.
Google Cloud Next 2025 by Google. There were a lot of announcements about how Google Cloud is changing to adapt its infrastructure to the ongoing AI race. But most of them are "just" quality of life improvements iterations, going from 1 to n instead of 0 to 1 breakthroughs. To me the two biggest announcements were their new Ironwood TPU Chips, designed to handle giant AI inference workloads at very low costs and latencies as well as their Agent to Agent Framework, that will help AI agents cooperate with one another, independent of the vendor. So a Claude-powered agent in the future should be able to cooperate with a Google-powered agent. In theory, this sounds really good and promising, but it remains to be seen how good the adoption of this will be.
One-Minute Video Generation with Test-Time Training by Karan Dalal, Daniel Koceja et al.. This new approach helps AI to generate longer videos with much more temporal coherence. Which means that AI videos can now tell more complex stories with only a single prompt. To demonstrate this, the authors curated a dataset of Tom and Jerry cartoons, and the results the AI produces almost look like real, short Tom and Jerry episodes. Their approach, however, is universal and can be extended to any sort of video, not just Tom and Jerry Cartoons. To me, the results seem very promising... of course, there are still artifacts from the base video generation models, but the general temporal consistency, especially when the video is switching between different scenes, is miles ahead of anything else out there. Soon we will have infinite amounts of new Tom and Jerry episodes ^^
🌌 Travel 🌌
The last two weeks have been a blast, I hiked to some of the most beautiful (and remote) places I have ever been to: Sajama National Park, La Valle de Las Animas, and the Salar de Uyuni—sometimes alone, sometimes with people I met on the road or with my good friend Alejandro.
My mind is still boggled by the sheer scale of the Bolivian Altiplano, and I'm still processing how beautiful these places were... but remembering the moments of these last two weeks and looking at these pictures fills me with joy. There are too many to put them all here, so I just selected a few of my absolute favorites.
🎶 Song 🎶
Ruins by Toby Fox
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
– Rico
Subscribe to Live and Learn 🌱
Join the Live and Learn Newsletter to receive updates on what happens in the world of AI and technology every two weeks on Sunday!