Special note: we’re teaching a fully updated part 1, in person, for seven weeks from Oct 30, 2017, at the USF Data Institute. See the course page for details and application form.
When we launched course.fast.ai we said that we wanted to provide a good education in deep learning. Part 1 of the course has now been viewed by tens of thousands of students, introducing them to nearly all of today’s best practices in deep learning, and providing many hours of hands-on practical coding exercises. We have collected some stories from graduates of part 1 on our testimonials page.
Today, we are launching Part 2: Cutting Edge Deep Learning for Coders. These 15 hours of lessons take you from part 1’s best practices, all the way to cutting edge research. You’ll learn how to:
Read and implement the latest research papers (even if you don’t have a math background)
Build a state of the art neural translation system
Create generative models for art, super resolution, segmentation, and more (including generative adversarial networks)
Apply deep learning to structured data and time series (such as for logistics, marketing, predictive maintenance, and fraud detection)
Karthik Kannan, founder of letsenvision.com, who told us “Today I’ve picked up steam enough to confidently work on my own CV startup and the seed for it was sowed by fast.ai with Pt1. and Pt.2”
Matthew Kleinsmith and Brendon Fortuner, who in 24 hours built a system to add filters to the background and foreground of videos, giving them victory in the 2017 Deep Learning Hackathon.
The prerequisites are that you’ve either completed part 1 of the course, or that you are already a confident deep learning practictioner who is comfortable implementing and using:
CNNs (including resnets)
RNNs (including LSTM and GRU)
Keras and numpy
What we cover
The course covers a lot of territory - here’s a brief summary of what you’ll learn in each lesson:
Lesson 8: Artistic Style
We begin with a discussion of a big change compared to part 1: from Theano to Tensorflow. You’ll learn about some of the exciting new developments in Tensorflow that have led us to the decision to make this change. We’ll also talk about a major project we highly recommend: build your own deep learning box!
We’ll also talk about how to approach one of the biggest challenges in this part of the course: reading acadmic papers. Don’t worry, it’s not as terrifying as it first sounds—especially once you know some of our little tricks.
Then, we start our deep dive into creative and generative applications, with artistic style transfer. You’ll be able to create beautiful and interesting images even if your artistics skills are as limited as Jeremy’s… :)
Lesson 9: Generative Models
We’ll learn about the extraordinarily powerful and widely useful technique of generative models. These are models that don’t just spit out a classification, but create a whole new image, sound, etc. They can be used, for example, with images, to:
We’ll try using this approach for super resolution (i.e. increasing the resolution of an image), and then you’ll get to try building your own system for rapidly adding the style of any artist to your photos. Have a look at the image on the right - the top very low resolution image has been input to the algorithm (see for instance the very pixelated fingers), and the bottom image has been created automatically from that!
Lesson 10: Multi-modal & GANs
A surprising result in deep learning is that models created from totally different types of data, such as text and images, can learn to share a consistent feature space. This means that we can create multi-modal models; that is, models which can combine multiple types of data. We will show how to combine text and images in a single model using a technique called DeVISE, and will use it to create a variety of search algorithms:
Text to image (which will also handle multi-word text descriptions)
Image to text (including handling types of image we didn’t train with)
And even image to image!
Doing this will require training a model using the whole imagenet competition dataset, which is a bigger dataset than we’ve used before. So we’re going to look at some techniques that make this faster and easier than you might expect.
We’re going to close our studies into generative models by looking at generative adversarial networks (GANs), a tool which has been rapidly gaining in popularity in recent months, and which may have the potential to create entirely new application areas for deep learning. We will be using them to create entirely new images from scratch.
Lesson 11: Memory Networks
We’ve covered a lot of different architectures, training algorithms, and all kinds of other CNN tricks during this course—so you might be wondering: what should I be using, when? The good news is that other folks have wondered that too, and have provided some great analyses of the pros and cons of various techniques in practice. We’ll be taking a look at a few highlights of these papers today.
Then we’re going to learn to GPU accelerate algorithms by taking advantage of Pytorch, which provides an interface that’s so similar to numpy that often you can move your algorithm onto the GPU in just an hour or two. In particular, we’re going to try to create the first (that we know of) GPU implementation of mean-shift clustering, a really useful algorithm that deserves to be more widely known.
To close out the lesson we will implement the heavily publicized “Memory Networks” algorithm, and will answer the question: does it live up to the hype?
Lesson 12: Attentional Models
It turns out that Memory Networks provide much of the key foundations we need to understand something which have become one of the most important advances in the last year or two: Attentional Models. These models allow us to build systems that focus on the most important part of the input for the current task, and are critical, for instance, in creating translation systems (which we’ll cover in the next lesson).
Lesson 13: Neural Translation
One application of deep learning that has progressed perhaps more than any other in the last couple of years is Neural Machine Translation. In late 2016 it was implemented by Google in what the New York Times called The Great A.I. Awakening. There’s a lot of tricks needed to reach Google’s level of translation capability, so we’ll be doing a deep dive in this lesson to learn nearly all the tricks used by state of the art systems.
Next up, we’ll learn about Densenets, which in July 2017 were awarded the CVPR Best Paper award, and have been shown to provide state of the art results in computer vision, particularly with small datasets. They are very similar to resnets, but with one key difference: the branches in each section are combined through concatenation, rather than addition. This apparently minor change makes a big difference in how they learn. We’ll also be using this technique in the next lesson to create a state of the art system for image segmentation.
Lesson 14: Time Series & Segmentation
Deep learning has generally been associated with unstructured data such as images, language, and audio. However it turns out that the structured data found in the columns of a database table or spreadsheet, where the columns can each represent different types of information in different ways (e.g. sales in dollars, area as zip code, product id, etc), can also be used very effectively by a neural network. This is equally true if the data can be represented as a time series (i.e. the rows represent different times or time periods).
In particular, what we learnt in part 1 about embeddings can be used not just for collaborative filtering and word encodings, but also for arbitrary categorical variables representing products, places, channels, and so forth. This has been highlighted by the results of two Kaggle competitions that were won by teams using this approach. We will study both of these datasets and competition winning strategies in this lesson.
Finally, we’ll look at how the Densenet architecture we studied in the last lesson can be used for image segmentation - that is, exactly specifying the location of every object in an image. This is another type of generative model, as we learnt in lesson 9, so many of the basic ideas from there will be equally applicable here.
I am thrilled to release fast.ai’s newest free course, Computational Linear Algebra, including an online textbook and a series of videos, and covering applications (using Python) such as how to identify the foreground in a surveillance video, how to categorize documents, the algorithm powering Google’s search, how to reconstruct an image from a CT scan, and more.
Jeremy and I developed this material for a numerical linear algebra course we taught in the University of San Francisco’s Masters of Analytics program, and it is the first ever numerical linear algebra course, to our knowledge, to be completely centered around practical applications and to use cutting edge algorithms and tools, including PyTorch, Numba, and randomized SVD. It also covers foundational numerical linear algebra concepts such as floating point arithmetic, machine epsilon, singular value decomposition, eigen decomposition, and QR decomposition.
What is numerical linear algebra?
“What exactly is numerical linear algebra?” you may be wondering. It is all about getting computers to do matrix math with speed and with acceptable accuracy, and way more awesome than the somewhat dull name suggests. Who cares about how computers do matrix math? All of you, probably. Data science is largely about manipulating matrices since almost all data can be represented as a matrix: time-series, structured data, anything that fits in a spreadsheet or SQL database, images, and language (often represented as word embeddings).
A typical first linear algebra course focuses on how to solve matrix problems by hand, for instance, spending time using Gaussian Elimination with pencil and paper to solve a small system of equations manually. However, it turns out that the methods and concerns for solving larger matrix problems via a computer are often drastically different:
Speed: when you have big matrices, matrix computations can get very slow. There are different ways of addressing this:
Different algorithms: there may be a less intuitive way to solve the problem that still gets the right answer, and does it faster.
Vectorizing or parallelizing your code.
Locality: traditional runtime computations focus on Big O, the number of operations computed. However, for modern computing, moving data around in memory can be very time-consuming, and you need ways to minimize how much memory movement you do.
Accuracy: Computers represent numbers (which are continuous and infinite) in a discrete and finite way, which means that they have limited accuracy. Rounding errors can add up, particularly if you are iterating! Furthermore, some math problems are not very stable, meaning that if you vary the input a little, you get a vastly different output. This isn’t a problem with rounding or with the computer, but it can still have a big impact on your results.
Memory use: for matrices that have lots of zero entries, or that have a particular structure, there are more efficient ways to store these in memory.
Scalability: in many cases you are interested in working with more data than you have space in memory for.
This course also includes some very modern approaches that we don’t know of any other numerical linear algebra courses covering (and we have done a lot of researching of numerical linear algebra courses and syllabi), such as:
This approach is very different from how most math courses operate: typically, math courses first introduce all the separate components you will be using, and then you gradually build them into more complex structures. The problems with this are that students often lose motivation, don’t have a sense of the “big picture”, and don’t know which pieces they’ll even end up needing. We have been inspired by Harvard professor David Perkin’s baseball analogy. We don’t require kids to memorize all the rules of baseball and understand all the technical details before we let them have fun and play the game. Rather, they start playing with a just general sense of it, and then gradually learn more rules/details as time goes on. All that to say, don’t worry if you don’t understand everything at first! You’re not supposed to. We will start using some “black boxes” or matrix decompositions that haven’t yet been explained, and then we’ll dig into the lower level details later.
I love math (I even have a math PhD!), but when I’m trying to solve a practical problem, code is more useful than theory. Also, one of the tests of whether you truly understand something is if you can code it, so this course if much more code-centered than a typical numerical linear algebra course.
The primary resource for this course is the free online textbook of Jupyter Notebooks, available on Github. They are full of explanations, code samples, pictures, interesting links, and exercises for you to try. Anyone can view the notebooks online by clicking on the links in the readme Table of Contents. However, to really learn the material, you need to interactively run the code, which requires installing Anaconda on your computer (or an equivalent set up of the Python scientific libraries) and you will need to be able to clone or download the git repo.
Accompanying the notebooks is a playlist of lecture videos, available on YouTube. If you are ever confused by a lecture or it goes too quickly, check out the beginning of the next video, where I review concepts from the previous lecture, often explaining things from a new perspective or with different illustrations.
The algorithm behind Google’s PageRank, used to rank the relative importance of different web pages
Can’t I just use sci-kit learn?
Many (although certainly not all) of the algorithms covered in this course are already implemented in scientific Python libraries such as Numpy, Scipy, and Scikit Learn, so you may be wondering why it’s necessary to learn what’s going on underneath the hood. Knowing how these algorithms are implemented will allow you to better combine and utilize them, and will make it possible for you to customize them if needed. In at least one case, we show how to get a sizable speedup over sci-kit learn’s implementation of a method. Several of the topics we cover are areas of active research, and there is recent research that has not yet been added to existing libraries.
Q: My daughter loves math and art. She’s currently an 8th grader. My husband and I are not STEAM (Science, Technology, Engineering, Art, Math) people. I’d love to expose her to possible career options but am limited by my ignorance and perhaps my location. Do you have any suggestions for an intelligent, young person who is about to start her high school journey?
A: First, I am so glad you are encouraging your daughter’s interests! I have several recommendations and resources. This is a fantastic time in history to be a kid with an internet connection interested in math and art.
1. She should learn to code. In STEM, code is the language of creativity, and without knowing how to code, you are reliant on tools created by others. A good place to start is with blockly games, which teaches programming concepts (such as loops, variables, and logic) though a variety of mazes and puzzles. Blockly library was developed by the Google for Education team.
A note for parents of younger children: you might want to check out scratch (language for children developed by MIT Media Lab), snap (drag-and-drop programming language), or snap circuits (electronics kits).
3. A ton of exciting advances are happening in the maker space– people creating clothing that lights up, machines that 3d print pancakes, robots to move your Klein bottle collection around– and there are lots of resources available for all ages. Maker spaces are being added in libraries across the country, and can include anything from 3D Printers, littleBits, LEGO Robotics, Arduinos, Snap Circuits, design software, woodworking tools, jewelry making tools, paper crafting equipment, microscopes and other science gadgets, sewing machines, and more, and many offer workshops or classes. You can also see if there is a regional Maker Faire in your area.
One of the students from our fast.ai course bought several tons of legos on ebay and constructed a machine to automatically sort the legos (old bulk lego is sold more cheaply, but the resale value for sorted Lego is much higher and can be quite lucrative for certain pieces). I want children to know that adults do things like create interactive colorful light-up clothing for the keynote speech at a professional conference, or construct machines to sort Legos in their free time. Both of these examples are by experts, but you do not need to be an expert to work with hardware or program an arduino.
4. Encourage her to start a blog about what she is learning, creating, and exploring. I recently wrote a post (inspired by a question from a college student) encouraging everyone to blog, and I think the advice certainly holds for high schoolers. Many schools relegate writing to the humanities and social sciences, and don’t give students the practice of writing about math and technology. Being able to write and communicate technical ideas clearly is a super important and useful skill in today’s world (art can help with this too!). As I said previously, a blog is like a resume, only better. This holds true for high school students as well, and could be useful in landing internships. Check out this post for tips on how to get started.
You can checkout the zines by Amy W (an MIT computer science grad who hacks knitting machines) or Julia Evans (an infrastructure engineer at credit processing startup Stripe) for great examples of how cartoons and sketches can illuminate technical concepts. They are also two women I deeply admire!
6. Miscellaneous Groups and Resources. Although these are location specific, note that groups exist in a wide variety of places, not just in major tech hubs like San Francisco or New York City:
Iridescent Technovation: Through Technovation, teams of teenage girls around the world (from 78 different countries!) build mobile apps to solve problems in their communities, create business plans, and launch their solutions.
Black Girls Code: Introduces Black girls to coding and game design. They’ve reached over 3,000 students in cities such as Atlanta, Miama, LA, Dallas, Memphis, and others, and have plans to expand.
Blue 1647 offers a variety of programs including teaching youth to create web and mobile apps, Latina Girls Code, MineCraft Development bootcamps, programs for individuals with intellectual disabilities, and more. It has locations in Chicago, St. Louis, Compton, Indiana, Haiti, and LA.
7. There is a lovely essay called A Mathematician’s Lament written by Paul Lockhart, a former Brown University math professor who quit to teach K-12. He describes a nightmare world in which children are not allowed to sing songs or play instruments until they have spent over a decade studying music notation, transcribing sheet music by hand in different keys, and memorizing their circle of fifths. That sounds horrifying! Yet it is how math is taught in most schools– the focus is on dry notation, formal rules, memorization, and disconnected components, with the fun and creative parts saved until long after most students have dropped out.
I hope you can encourage your child to keep a sense of creativity, beauty, pattern, and play when approaching math. I know it can be difficult for children to maintain their curiosity and passion for subjects when adults or peers don’t understand their interests.
My daughter is still a toddler, so I haven’t gotten to experience this firsthand yet and I would love to hear from those of you who have! Also, a huge thanks to everyone who gave me suggestions for this article on Twitter.
This week’s Ask-A-Data-Scientist column has a question from a college freshman at my alma mater, Swarthmore. Please email your data science related quandaries to email@example.com. Note that questions are edited for clarity and brevity. Previous posts include:
Q: I’m currently a freshman at Swarthmore College and I’m really interested in machine learning and deep learning. I wanted to take Artificial Intelligence this semester; unfortunately, no freshmen got into the class as it has been difficult for the CS department to keep up with the huge spike in interest.
I’m currently taking Andrew Ng’s Coursera Course on Machine Learning and will finish it in ~2-3 weeks. Next, I was planning on taking your fast.ai MOOC, which I saw on hacker news.
I know you may be too busy, but can I ask you questions I have about ML and my proposed plan? How can I continue to learn machine learning after Ng’s Coursera course and fast.ai? It seems like the only two options are 1.) research and 2.) graduate level courses at UPenn (which seem to be quite difficult to get into from Swarthmore (especially as a first-year student)). Any advice would be appreciated.
A: In general, I am happy to answer questions, although it may take me some time (my inbox, oh my inbox). For technical questions, it’s best to first ask on our fast.ai forums. There are tons of interesting discussions on our forums, even if you are not taking our course. For career-related or general questions, I often answer them in my ask-a-data-scientist column.
Even without Swarthmore or UPenn’s AI classes, you will never run out of things to do with deep learning or ways to learn more. Our MOOC takes 70 hours of study to complete, and if you get interested in any of the Kaggle competitions we have you start, you could spend much longer. We will be releasing Part 2 in a few months, which will be a similar time commitment, only with even more side avenues for further study, recommended papers to read, and ways to extend the work.
Take the official classes when/if you are able, but you don’t need the credentials or resources from official classes (to anyone out there not in university or at a university that doesn’t offer an AI class, don’t worry: you don’t need them!). One of our students, who was an econ major with no graduate degree, was just accepted to the prestigious Google Brain residency program! Another student developed a new fraud detection technique based on material from our course and has received a bonus at his job. Several others have received internship and job offers, or switched teams in their current workplaces to more exciting machine learning projects.
Credentials can sometimes be useful to get your foot in the door, particularly if you are an underrepresented minority in tech (and thus facing greater scrutiny).
However, there are lots of even more effective ways to get your name and work out there:
Write a popular blog post (more on this below).
Create an interesting app and put it online.
Write helpful answers to others’ questions on the learn machine learning subreddit or on the fast.ai forums. Altruism is important to me, but that’s not why I recommend helping others. Explaining something you’ve learned to someone else is a key part of solidifying your own understanding.
Do your own experiments, and share the results via a blogpost or github. One of our students, Slav Ivanov, asked about using different optimizers for style transfer. Jeremy suggested he try it out, and Slav wrote an excellent blog post on what he found. This post was very popular on reddit and made Slav’s work more widely known.
Contribute to open source. Here, one of our students shares about his positive experience contributing to TensorFlow. With 3 lines of code, he reduced the binary size of TensorFlow on Android to less than 10MB!
In general, I recommend that you start a side project of something that interests you (that uses deep learning) so you will have that to work on.
Why you (yes, you) should blog
The top advice I would give my younger self would be to start blogging sooner. Here are some reasons to blog:
It’s like a resume, only better. I know of a few people who have had blog posts lead to job offers!
Helps you learn. Organizing knowledge always helps me synthesize my own ideas. One of the tests of whether you understand something is whether you can explain it to someone else. A blog post is a great way to do that.
I’ve gotten invitations to conferences and invitations to speak from my blog posts. I was invited to the TensorFlow Dev Summit (which was awesome!) for writing a blog post about how I don’t like TensorFlow.
Meet new people. I’ve met several people who have responded to blog posts I wrote.
Saves time. Any time you answer a question multiple times through email, you should turn it into a blog post, which makes it easier for you to share the next time someone asks.
To inspire you, here are some sample blog posts from students in part 2 of our course:
I enjoyed all of the above blog posts and also, I don’t think any of them are too intimidating. They’re meant to be accessible.
Tips for getting started blogging
Jeremy had been suggesting for years that I should start blogging, and I’d respond “I don’t have anything to say.” This wasn’t true. What I meant was that I didn’t feel confident, and I felt like the things I could write had already been written about by people with more expertise or better writing skills than me.
It turns out that is fine! Your posts don’t have to be earth-shattering or even novel to be read and shared. My writing skills were rather weak when I started (part of the reason I chose to study math and CS in college was because those courses requried the least amount of writing and also no labs), but my skills are improving with time.
Here are some more tips to help you start your first post:
Make a list of links to other blog posts, articles, or studies that you like, and write brief summaries or highlight what you particularly like about them. Part of my first blog post came from my making just such a list, because I couldn’t believe more people hadn’t read the posts and articles that I thought were awesome.
Summarize what you learned at a conference you attended, or in a class you are taking.
Any email you’ve written twice should be a blog post. Now, if I’m asked a question that I think someone else would also be interested in, I try to write it up.
Don’t be a perfectionist. I spent 9 months on my first blog post, it went viral, and I have repeatedly hit new lows in readership ever since then. One of my personal goals for 2017 is to post my writing quicker and not to obsess so much before I post, because it just builds up pressure and I end up writing less.
You are best positioned to help people one step behind you. The material is still fresh in your mind. Many experts have forgotten what it was like to be a beginner (or an intermediate) and have forgotten why the topic is hard to understand when you first hear it. The context of your particular background, your particular style, and your knowledge level will give a different twist to what you’re writing about.
If you are a woman in NYC, Chicago, or San Francisco, I recommend joining your local chapter of Write/Speak/Code, a group that encourages women software developers to write blog posts, speak at conferences, and contribute to open source.
Get angry. The catalyst that finally got me to start writing was when someone famous said something that made me angry. So angry that I had to explain all the ways his thinking was wrong.
If you’re wondering about the actual logistics, Medium makes it super simple to get started. Another option is to use Jekyll and Github pages. I can personally recommend both, as I have 2 blogs and use one for each.
You are on the right path by taking MOOCs, and by adding in a side project, involvement in online communities, and blogging you will have even more opportunities to learn and meet others!
This week’s Ask-A-Data-Scientist column answers two short questions from students. Please email your data science related quandaries to firstname.lastname@example.org. Note that questions are edited for clarity and brevity. Previous posts include:
Q1: I have a BS and MS in aerospace engineering and have been accepted to a data science bootcamp for this summer. I have been spending 15 hours/week on MIT’s 6.041 edx.org probability course, which is the hardest math course I’ve ever taken. I feel like my time could be better spent elsewhere. What about teaching myself the concepts as needed on the job? Or maybe you could recommend certain areas of probability to focus on? I’d like to tackle a personal project (either dealing with fitness tracker data or bitcoin) and maybe put probability on the backburner for a bit.
A: It sounds like you already know the answer to this one: yes! your time could be better spent elsewhere.
Let your coding projects motivate what you do, and learn math on an as needed basis. There are 3 reasons this is a good approach:
For most people, the best motivation will be letting the problems you’re working on motivate your learning.
The real test of whether you understand something is whether you can use it and build with it. So the projects you’re working on are needed to cement your understanding.
By learning on an as-needed basis, you study what you actually need, and don’t waste time on topics that may end up being irrelevant.
The only exceptions: if you want to be a math professor or work at a think tank (for most of my math phd, my goal was to become a math professor, so I see the appeal, but I was also totally unaware at the time of the breadth of awesome and exciting jobs that use math). And sometimes you need to brush up on math for white-boarding interviews.
Q2: I am currently pursuing a Master’s degree in Data Science. I am not that advanced in programming and new to most of the concepts of machine learning & statistics. Data science is such a vast field so most of my friends advise me to concentrate on a specific branch. Right now I am trying everything and becoming a jack in all and ace at none. How can I approach this to find a specialty?
A: There is nothing wrong with being a jack of all trades in data science; in some ways, that is what it means to be a data scientist. As long as you are spending the vast majority of your time writing code for practical projects, you are on the right track.
My top priorities of things to focus on for aspiring data scientists:
Focus on Python (including Numpy, Pandas, and Jupyter notebooks).
Try to focus on 1 main project. Extend something that you did in class. It can be difficult if you are mostly doing scattered problem sets in a variety of areas. For self-learners, one of the risks is jumping around too much and starting scattered tutorials across a range of sites, but never going deep enough with any one thing. Pick 1 Kaggle competition, personal project, or extension of a school project and stick with it. I can think of a few times I continued extended a class project for months after the class ended, because I was so absorbed in it. This is a great way to learn.
Start with decision tree ensembles (random forests and gradient boosting machines) on structured data sets. I have deeply conflicted feelings on this topic. While it’s possible to do these in Python using sklearn, I think R still handles structured datasets and categorical variables better. However, if you are only going to master one language, I think Python is the clear choice, and most people can’t focus on learning 2 new languages at the same time.
Then move on to deep learning using the Python library Keras. To quote Andrew Ng, deep learning is “the new electricity” and a very exciting, high impact area to be working in.
In terms of tips, there are a few things you can skip since they aren’t widely used in practice, such as support vector machines/kernel methods, Bayesian methods, and theoretical math (unless it’s explicitly necessary for a practical project you are working on).
Note that this answer is geared towards data scientists and not data engineers. Data engineers put algorithms into production and have a different set of skills, such as Spark and HDFS.