AI Video Generation, ChipNeMo, and Real-Time Image Generation – Live and Learn #30
Welcome to this edition of Live and Learn. This time with several updates on AI Video Generation, a piece on how companies should think about AI adoption, Real-Time Image Generation by StabilityAI, and a summary of the ousting of Sam Altman from OpenAI, and much more… I hope you enjoy.
✨ Quote ✨
Purpose turns impediments to happiness into sources of happiness — because meaning is our greatest driver of long-term joy.
– Nik Göke - (source)
Links
Stable Diffusion Video by StabilityAI. StabililityAI, the company behind the open-sourced Stable Diffusion model, has released a new AI that can generate videos from text prompts. The results are quite impressive and the model is open-source as well. The main problem right now with text-to-video is that the video length is still really short. Currently, the videos are only 4 seconds each, but soon we will see longer and longer videos, with even better quality.
EMU Text to Video AI by Meta. Not just Stability AI has been working on generative models that can output videos. Meta has been busy getting this to work too. I think that in 2024 we will see similar jumps in the quality of these video generation models as we saw in the last year with the developments of Midjourney and Dalle-3 in image generation. Already people are using Runways tools to generate entire videos, but now with more players out there, the adoption will only grow, improving the existing models by giving them more human feedback on the generations. Feedback that can be incorporated and learned from. Soon, AI-generated videos will be as commonplace as AI-generated images are now. And it is crazy to think that this whole "trend" has only taken a bit over a year to play out.
Rebuilding Organizational Structures to Deal with the Rise of AI. With all of the changes going on around AI it is important how companies think about and incorporate these new technologies. This post deals with this topic and gives some food for thought on how to improve company processes and truly embrace AI. Most important of all, is that the time to utilize AI is now or never since the advantages accrued by people who successfully use these new tools and capabilities, will be able to outcompete those who don't. On the same note, I found a product called Olympia Chat, while researching this newsletter that is showing the nature of things to come: AI, deeply integrated into team structures to handle specific tasks and automating key steps along the way.
EMU Edit AI by Meta. Another big release from Meta happened in these last 2 weeks: A model that allows you to edit images with text prompts. Their demos are mindblowing and the model is–in what amounts to Meta fashion these days–completely open-source. A context-aware, magical editing tool, that just does what you tell it to do, is reality now.
Brain Implant Human Clinical Trials by Neuralink. Neuralink recently announced that they got approval for human clinical trials of their brain implants, in a short video clip. This is a huge step forward for them because, if successful, they could use their technology to make the lives of lots of people with severe disabilities much much better.
Using LLMs to design better Chips by Nvidia. In a masterclass example of how to embrace AI to re-imagine workflows and make them more efficient, Nvidia is using its massive knowledge in AI to make their own lives of designing chips easier. They built a custom LLM-based system that acts kinda like CoPilot but for the design of computing hardware instead of code. The whole thing is "only" a fine-tuned, specialized version of LLaMa-2 but it can help their engineers a lot already. Nvidia has made their learnings accessible to others via their paper and the NeMo framework for finetuning and deploying AIs for custom purposes at scale.
Real Time Image Generation by StabilityAI. Image Generation with Midjourney and tools like it, while impressive, takes time, while you wait until the images have been generated. With this new model from StabilityAI, you can generate images in real-time, which is a huge step forward for the technology. It feels surreal that it can generate what you are typing, faster than you are typing it. And the amount of creative freedom and direction this offers to the generative process is surreal. You can try it out at Stability AI's clip drop site. I don't find it quite as fast as they promised, but still, it's insanely quick compared to Midjourney and the output quality is quite impressive too.
Sam Altman ousted from OpenAI. This Wikipedia summary is quite accurate and has links to sources and lots and lots of further reading. The reporting on the Verge on the story was also superb. Even though I followed the whole thing on X as it happened, I enjoyed reading the Wikipedia article later, as it serves as a good, unspeculative summary of what has happened. A lot of people have lost sleep over this, while it was happening, and still, nobody really knows what happened in full detail and why Sam Altman was fired after all. Rumors include Q*, an internal research project that might be AGI, Sam Altman abusing the OpenAI name to gain capital for building a competitor to Nvidia, Microsoft planning everything to "buy" OpenAI, OpenAI being a threat to Quoras Poe, which D'Angelo, one of the board members didn't like and therefore staged a "coup" and many many more. At the end of the day, we simply don't know (yet).
Finally, there were big updates to the foundation models of 2 of OpenAIs competitors: Inflection 2 by InflectionAI and the Claude 2 by Anthropic (with a whopping 200k context window!) are now both available. But GPT-4 remains the best foundation model out there, at least for now.
🌌 Midjourney 🌌
🎶 Song 🎶
Dog Days Are Over by Florence and the Machine
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
– Rico