From Deep Learning Foundations to Stable Diffusion

We’ve released our new course with over 30 hours of video content.
courses
Author

Jeremy Howard

Published

April 4, 2023

Today we’re releasing our new course, From Deep Learning Foundations to Stable Diffusion, which is part 2 of Practical Deep Learning for Coders.

Get started

In this course, containing over 30 hours of video content, we implement the astounding Stable Diffusion algorithm from scratch! That’s the killer app that made the internet freak out, and caused the media to say “you may never believe what you see online again”.

We’ve worked closely with experts from Stability.ai and Hugging Face (creators of the Diffusers library) to ensure we have rigorous coverage of the latest techniques. The course includes coverage of papers that were released after Stable Diffusion came out – so it actually goes well beyond even what Stable Diffusion includes! We also explain how to read research papers, and practice this skill by studying and implementing many papers throughout the course. Thank you to all the amazing people who helped put this course together. I’d particularly like to thank Tanishq Mathew Abraham (Stability.ai) and Jonathan Whitaker (co-author of the upcoming O’Reilly Diffusion book) for helping me present a number of the lessons, and also the great behind-the-scenes contributions by Pedro Cuenca (Hugging Face). Thanks also to Kat Crowson for her k-diffusion library which we use heavily throughout the course, and for answering all our questions, and to Francisco Mussari for creating transcripts for most of the lessons.

Stable diffusion, and diffusion methods in general, are a great learning goal for many reasons. For one thing, of course, you can create amazing stuff with these algorithms! To really take the technique to the next level, and create things that no-one has seen before, you need to really deeply understand what’s happening under the hood. With this understanding, you can craft your own loss functions, initialization methods, multi-model mixups, and more, to create totally new applications that have never been seen before. Just as important: it’s a great learning goal because nearly every key technique in modern deep learning comes together in these methods. Contrastive learning, transformer models, auto-encoders, CLIP embeddings, latent variables, u-nets, resnets, and much more are involved in creating a single image.

To get the most out of this course, you should be a reasonably confident deep learning practitioner. If you’ve finished fast.ai’s Practical Deep Learning course then you’ll be ready! If you haven’t done that course, but are comfortable with building an SGD training loop from scratch in Python, being competitive in Kaggle competitions, using modern NLP and computer vision algorithms for practical problems, and working with PyTorch and fastai, then you will be ready to start the course. (If you’re not sure, then we strongly recommend getting starting with Practical Deep Learning.)

Get started now!

Content summary

In this course we’ll explore diffusion methods such as Denoising Diffusion Probabilistic Models (DDPM) and Denoising Diffusion Implicit Models (DDIM). We’ll get our hands dirty implementing unconditional and conditional diffusion models from scratch, building and experimenting with different samplers, and diving into recent tricks like textual inversion and Dreambooth. We will also study and implement the 2022 paper by Karras et al, Elucidating the Design Space of Diffusion-Based Generative Models, which uses pre-conditioning to ensure that inputs and targets to the model are scaled to unit variance. The Karras model predicts an interpolated version of the clean image and the noise, depending on the amount of noise present in the input.

Along the way, we’ll cover essential deep learning topics including a variety of neural network architectures, data augmentation approaches (including the amazingly effective and criminally under-appreciated TrivialAugment strategy), and various loss functions, including perceptual loss and style loss. We’ll build our own models from scratch, such as Multi-Layer Perceptrons (MLPs), ResNets, and Unets, while experimenting with generative architectures like autoencoders and transformers.

Throughout the course, we’ll use PyTorch to implement our models (but only after we’ve implemented everything needed in pure Python first!), and will create our own deep learning framework called miniai. We’ll master Python concepts like iterators, generators, and decorators to keep our code clean and efficient. We’ll also explore deep learning optimizers like AdamW and RMSProp, learning rate annealing, and learning how to experiment with the impact of different initialisers, batch sizes and learning rates. And of course, we’ll make use of handy tools like the Python debugger (pdb) and nbdev for building Python modules from Jupyter notebooks.

Lastly, we’ll touch on fundamental concepts like tensors, calculus, and pseudo-random number generation to provide a solid foundation for our exploration. We’ll apply these concepts to machine learning techniques like mean shift clustering and convolutional neural networks (CNNs), and will see how to use tracking with Weights and Biases (W&B).

We’ll also tackle mixed precision training using both NVIDIA’s apex library, and the Accelerate library from Hugging Face. We’ll investigate various types of normalization like Layer Normalization and Batch Normalization. By the end of the course, you’ll have a deep understanding of diffusion models and the skills to implement cutting-edge deep learning techniques.

Get started now!

Tanishq’s thoughts

Here’s what Tanishq Mathew Abraham, from Stability.ai, who helped teach a number of the lessons, thinks of the course:

The fast.ai Part 2 course is a one-of-its-kind course. I think this course is unique in that it teaches you how to build deep learning models from scratch while also exploring cutting-edge research in diffusion models. No other course is guiding you through state-of-the-art papers in the diffusion space (sometimes a mere few weeks after they first appear) and building clear, accessible implementations. We’ve even explored some new research directions in the course, and hopefully the course enables others to explore their own ideas further.

If you are interested in a more advanced course building state-of-the-art deep learning models from scratch, and/or you’re interested in how state-of-the-art diffusion models work and how to build them, this is the course for you! Even as someone helping with the development of this course, I found this to be an amazing learning experience, and I hope it is for you too!