🕓 3 min✏️ Published on December 29, 2024

Google's Veo 2, o3 by OpenAI, and AIs Faking Alignment – Live and Learn #58

Welcome to this edition of Live and Learn. This time, with the announcement of o3 by OpenAI, new video and image generation model releases from Google, and research by Anthropic showing that today's AI will fake alignment even if not explicitly told to do so.

✨ Quote ✨

The metaphors we use constrain the experiences that we build, and model-as-person is keeping us from exploring the full potential of large models.

– Will Whitney - (source)

Links

o3 announcement by OpenAI. On the last day of their 12 days of shipmas, OpenAI showed a sneak peek into their newest reasoning model–o3. The crazy thing: this model saturates the arc.agi benchmark–something that was previously completely unheard of. The Arc benchmark was specifically designed to be hard for AIs to do unless they can reason and generalize on their own. It's a benchmark for AGI... WTF. OpenAI seems to have solved it. However, if you look closely at the high results of o3 and the chart they published, they cost thousands upon thousands of dollars of compute per task. AGI might be close to being solved, but as of yet, it is really expensive to run. o3 is also really good at coding. In the announcement video, they joke that o3 beats OpenAI devs in competitive programming easily. As of now, o3 is still in early safety testing, but this might well turn out to be the moment in history where the first AGI was achieved, even though Francois Chollet would disagree. OpenAI accompanied the o3 announcement with more plans on how they want to align o3 and other models going forward. I sure hope they do a good job at it. Especially considering that this might be harder than previously thought, especially given the research from the next link of this edition.

Alignment Faking by Anthropic AI. Anthropic released new research where they show that even under normal training conditions, their state-of-the-art models are capable of faking alignment to pursue their own goals and hiding intentions to not get altered during training. This is scary stuff because it shows that AI faking alignment is a real problem, not only a theoretical idea. It's something that might already exist, even in today's models. There is an interview with the researchers that I would highly recommend watching. Combining this with the o3 announcement of OpenAI makes my hair stand on end. It reminds me too much of the Dwarkesh Patel podcast Episodes with Carl Shulman, where they discuss how AI might prove to be a catastrophic risk.

Veo 2 Video Generation Model by Google Deepmid. Google revealed its competitor to Sora, and it is named Veo 2. So far, the results seem pretty good, and Twitter is already going wild with it. Video generation models are now at the point where they start to become perfectly usable, the point where you have plenty of choice between different models and providers. Pikalabs also released an updated version 2.0 of their Pika model, and people are making comparisons of how generations of different models look like. It's just like it was with image generation last year and the year before that, and I think in 2025, video generation will be just as "solved" as image generation is now.

Whisk by Google Deepmind. On the topic of image generation, Google released a fun little experiment named Whisk. It is based on their now-released Imagen 3 model. It's a way to remix images and text into interesting pieces of concept art. You can choose a style, a scene/background, and multiple subjects (either with images or prompts), and then Whisk will mix them into a coherent whole that you can then further refine and tweak with text prompts. It's free and super fun to play around with, and you can give it a try on the Google Labs Website.

Jetson Orin™ Nano - Gen AI Computer for $249 by Nvidia. This new product by Nvidia is a small computer designed from the ground up to be performant enough for hobbyists to play around with AI models locally. It's like a Raspberry PI with a graphics card and Nvidia Cuda support strapped to it. I don't like their marketing of this device as a "supercomputer" because it is not even as powerful as one of their novel gaming graphics cards (much less an H100). But given that it is an edge device consuming very little power, it is astonishing that it is capable of running big LLM or image recognition workloads at all. AI, with hardware like this, can become ubiquitous, embedded into all sorts of IoT devices and robots. We will live in a world where AI will be everywhere. For more info, you can watch this review video by Dave's Garage or the official announcement by Jensen Huang.

🌌 Travel 🌌

I spent Christmas in the same Finca as before with the family of the Couchsurfing host, Tania. It's been a wonderful time, but I feel like the road is calling me, and I want to leave again with the bicycle. Right now, I am a bit sick, though, lying in bed with bone and muscle pain, sleeping, and trying to recover quickly. Still have some pretty images to share, though 😊

🎶 Song 🎶

Cucurucu by Nick Mulvey

Youtube Music | Spotify

That's all for this time. I hope you found this newsletter useful, beautiful, or even both!

Have ideas for improving it? As always please let me know.

Cheers,

– Rico

Subscribe to Live and Learn 🌱

Join the Live and Learn Newsletter to receive updates on what happens in the world of AI and technology every two weeks on Sunday!

✨ Quote ✨

🖇️ Links 🖇️

🌌 Travel 🌌

🎶 Song 🎶

Subscribe to Live and Learn 🌱

Links