Deep learning has great potential, but currently the people using this technology are overwhelmingly white and male. We’re already seeing society’s racial and gender biases being encoded into software that uses AI when built by such a homogeneous group. Additionally, people can’t address problems that they’re not aware of, and with more diverse practitioners, a wider variety of important societal problems will be tackled.
We want to get deep learning into the hands of as many people as possible, from as many diverse backgrounds as possible. People with different backgrounds have different problems they’re interested in solving. The traditional approach is to start with an AI expert and then give them a problem to work on; at fast.ai we want people who are knowledgeable and passionate about the problems they are working on, and we’ll teach them the deep learning needed to address them.
Deep Learning can be misused
Deep learning isn’t “more biased” than simpler models such as regression; however, the amazing effectiveness of deep learning suggests that it will be used in far more applications. As a society, we risk encoding our existing gender and racial biases into algorithms that determine medical care, employment decisions, criminal justice decisions, and more. This is already happening with simple models, but the widespread adoption of deep learning will rapidly accelerate this trend. The next 5 to 10 years are a particularly crucial time. We must get more women and people of Color building this technology in order to recognize, prevent, or address these baises.
Earlier this year, Taser (now rebranded Axon), the maker of the electronic stun guns, acquired two AI companies. Taser/Axon owns 80% of the police body camera market in the US, keeps this footage from police body cams in private databases, and is now advertising that they are developing technology for “predictive policing”. As a private company they are not subject to the same public records laws or oversight that police departments are. Given that racial bias in policing has been well-documented and shown to create negative feedback loops, this is terrifying. What kind of biases may be in their datasets or algorithms?
Google’s popular Word2Vec language library (covered in Lesson 5 of our course and in a workshop I gave this summer) has learned meaningful analogies, such as man is to king as women is to queen. However, it also creates sexist analogies such as man is to computer programmer as woman is to homemaker. This is concerning as Word2Vec has become a commonly used building block in a wide variety of applications. This is not the first (or even second) time Google’s use of deep learning has showed troubling biases. In 2015, Google Photos labeled Black people as “gorillas” while automatically labeling photos. Google Translate continues to provide sexist translations such as translating “O bir doktor. O bir hemşire” to “He is a doctor. She is a nurse” even though the original Turkish did not specify gender.
The state of diversity in AI
A year after prominent Google AI leader Jeff Dean said he is deeply worried about the lack of diversity in AI, guess what the diversity stats of the Google Brain team is? It is ~94% male with 44 men and just 3 women and over 70% White. OpenAI’s openness does not extend to sharing diversity stats or who works there, and from photos, the OpenAI team looks extremely homogenous. I’d guess that it’s even less diverse than Google Brain. Earlier this year Vanity Fair ran an article about AI that featured 60 men, without quoting a single woman that works in AI.
Google Brain, OpenAI, and the media can’t solely blame the pipeline for this lack of diversity, given that there are over 1,000 women active in machine learning. Furthermore, Google has a training program to bring engineers in other areas up to speed on AI, which could be a great way to increase diversity. However, this program is only available to Google engineers, and just 3% of Google’s technical employees are Black or Latino (despite the fact that 90,000 Black and Latino students have graduated with computer science majors in the US in the last decade); thus, this training program is not going to have much impact on diversity.
At fast.ai, we want to do our part to increase diversity in deep learning and to lower the unnecessary barriers to entry for everyone. Therefore, we are providing diversity scholarships for our updated in-person Practical Deep Learning for Coders course presented in conjunction with the University of San Francisco Data Institute, to be offered on Monday evenings in in downtown San Francisco and beginning on Oct 30. Wondering if you’re qualified? The only requirements are:
At least 1 year of coding experience
At least 8 hours a week to commit to the course (includes time for homework)
Curiosity and a willingness to work hard
Identify as a woman, person of Color, LGBTQ person, person with a disability, and/or veteran
Be available to attend in-person 6:30-9pm, Monday evenings, in downtown San Francisco (SOMA)
Last year we attempted an experiment: to see if we could teach deep learning to coders, with no math pre-requisites beyond high school math, and get them to state-of-the-art results in just 7 weeks. This was very different from other deep learning materials, many of which assume a graduate level math background, focus on theory, only work on toy problems, and don’t even include the practical tips. We didn’t even know if what we were attempting was possible, but the fast.ai course was a huge success!
The traditional approach to teaching math or deep learning requires that all the underlying components and theory be taught before learners can start creating and using models on their own. This approach to teaching is similar to not allowing children to play baseball until they have memorized all the formal rules and are able to commit to a full 9 innings with a full team, or to not allowing children to sing until they have extensive practice transcribing sheet music by hand in different keys. We want to get people “playing ball” (that is, applying deep learning to the problems they care about and getting great results) as quickly as possible, and we drill into the details later, as time goes on.
How to Apply
Women, people of Color, LGBTQ people, people with disabilities, and veterans in the Bay Area, if you have at least one year of coding experience and can commit 8 hours a week to working on the course, we encourage you to apply for a diversity scholarship. The number of scholarships we are able to offer depends on how much funding we receive. To apply, email firstname.lastname@example.org:
title your email “Diversity Fellowship Application”
include your resume
1 paragraph describing one or more problems you’d like to apply deep learning to
confirm that you are available to attend the course on Monday evenings in SOMA (for 7 weeks, beginning Oct 30), and that you can commit 8 hours a week to working on the course
which under-indexed group(s) you are a part of (gender, race, sexual identity, veteran)
The deadline to apply is Sept 15, 2017.
To those outside the Bay Area
We will again have a remote/international fellows program, which is separate from our diversity scholarships. Details on how to apply will be announced in a separate blog post in the next few weeks, so stay tuned.
“So how is fast.ai different from OpenAI?” I’ve been asked this question numerous times, and on the surface, there are several similarities: both are non-profits, both value openness, and both have been characterized as democratizing AI. One significant difference is that fast.ai has not been funded with $1 billion from Elon Musk to create an elite team of researchers with PhDs from the most impressive schools who publish in the most elite journals. OpenAI, however, has.
It turns out we’re different in pretty much every other way as well: in our goals, values, motivations, and target audiences.
Artificial General Intelligence
There is a lot of confusion about the term AI. To some people, AI means building a super-intelligent computer that is similar to a human brain (this is often referred to as Artificial General Intelligence or AGI). These people are often interested in philosophical questions such as What does it mean to be human? or What is the nature of intelligence? They may have fears such as Could super-intelligent machines destroy the human race? A New Yorker article describes OpenAI’s goal as making sure that AI doesn’t wipe out humanity, and it’s webpage says that the mission is to create artificial general intelligence. Elon Musk has expressed fear about DeepMind (acquired by Google for $500 million) and the development of evil AI, saying “If the A.I. that they develop goes awry, we risk having an immortal and superpowerful dictator forever. Murdering all competing A.I. researchers as its first move strikes me as a bit of a character flaw.”
Cracking AGI is a very long-term goal. The most relevant field of research is considered by many to be reinforcement learning: the study of teaching computers how to beat Atari. Formally, reinforcement learning is the study of problems that require sequences of actions that result in a reward/loss, and not knowing how much each action contributes to the outcome. Hundreds of the world’s brightest minds, with the most elite credentials, are working on this Atari problem. Their research has applications to robotics, although a lot of it is fairly theoretical. OpenAI is largely motivated by publishing in top academic journals in the field, and you would need to have a similarly elite background to understand their papers.
Practical Artificial Intelligence
To other people, AI refers to that algorithms that display some level of intelligence in doing application-focused tasks, such as algorithms that can:
The capabilities listed above are transformative: improvements to medicine will help fill a global shortage of doctors, improve outcomes even in countries that have enough doctors, and save millions of lives. Self-driving cars are safer and will drastically reduce traffic fatalities, congestion, and pollution.
At fast.ai, we are not working on AGI, but instead focused on how to make it possible for more people from a wide variety of backgrounds to create and use neural net algorithms. We have students working to stop illegal deforestation, create more resources for the Pakistani language Urdu, help farmers in India prove how much land they own so they qualify for crop insurance, working on wearable devices to monitor Parkinson’s disease, and more. These are issues that Jeremy and I knew very little (if anything) about, and illustrate the importance of making this technology as accessible as possible. People with different backgrounds, in different locations, with different passions, are going to be aware of whole new sets of problems that they want to solve.
It is hard for me to empathize with Musk’s fixation on evil super-intelligent AGI killer robots in a very distant future. I support research funding at all levels, and have nothing against mostly theoretical research, but is it really the best use of resources to throw $1 billion at reinforcement learning without any similar investments into addressing mass unemployment and wealth inequality (both of which are well-documented to cause political instability), how existing gender and racial biases are being encoded in our algorithms, and on how to best get this technology into the hands of people working on high impact areas like medicine and agriculture around the world?
Special note: we’re teaching a fully updated part 1, in person, for seven weeks from Oct 30, 2017, at the USF Data Institute. See the course page for details and application form.
When we launched course.fast.ai we said that we wanted to provide a good education in deep learning. Part 1 of the course has now been viewed by tens of thousands of students, introducing them to nearly all of today’s best practices in deep learning, and providing many hours of hands-on practical coding exercises. We have collected some stories from graduates of part 1 on our testimonials page.
Today, we are launching Part 2: Cutting Edge Deep Learning for Coders. These 15 hours of lessons take you from part 1’s best practices, all the way to cutting edge research. You’ll learn how to:
Read and implement the latest research papers (even if you don’t have a math background)
Build a state of the art neural translation system
Create generative models for art, super resolution, segmentation, and more (including generative adversarial networks)
Apply deep learning to structured data and time series (such as for logistics, marketing, predictive maintenance, and fraud detection)
Karthik Kannan, founder of letsenvision.com, who told us “Today I’ve picked up steam enough to confidently work on my own CV startup and the seed for it was sowed by fast.ai with Pt1. and Pt.2”
Matthew Kleinsmith and Brendon Fortuner, who in 24 hours built a system to add filters to the background and foreground of videos, giving them victory in the 2017 Deep Learning Hackathon.
The prerequisites are that you’ve either completed part 1 of the course, or that you are already a confident deep learning practictioner who is comfortable implementing and using:
CNNs (including resnets)
RNNs (including LSTM and GRU)
Keras and numpy
What we cover
The course covers a lot of territory - here’s a brief summary of what you’ll learn in each lesson:
Lesson 8: Artistic Style
We begin with a discussion of a big change compared to part 1: from Theano to Tensorflow. You’ll learn about some of the exciting new developments in Tensorflow that have led us to the decision to make this change. We’ll also talk about a major project we highly recommend: build your own deep learning box!
We’ll also talk about how to approach one of the biggest challenges in this part of the course: reading acadmic papers. Don’t worry, it’s not as terrifying as it first sounds—especially once you know some of our little tricks.
Then, we start our deep dive into creative and generative applications, with artistic style transfer. You’ll be able to create beautiful and interesting images even if your artistics skills are as limited as Jeremy’s… :)
Lesson 9: Generative Models
We’ll learn about the extraordinarily powerful and widely useful technique of generative models. These are models that don’t just spit out a classification, but create a whole new image, sound, etc. They can be used, for example, with images, to:
We’ll try using this approach for super resolution (i.e. increasing the resolution of an image), and then you’ll get to try building your own system for rapidly adding the style of any artist to your photos. Have a look at the image on the right - the top very low resolution image has been input to the algorithm (see for instance the very pixelated fingers), and the bottom image has been created automatically from that!
Lesson 10: Multi-modal & GANs
A surprising result in deep learning is that models created from totally different types of data, such as text and images, can learn to share a consistent feature space. This means that we can create multi-modal models; that is, models which can combine multiple types of data. We will show how to combine text and images in a single model using a technique called DeVISE, and will use it to create a variety of search algorithms:
Text to image (which will also handle multi-word text descriptions)
Image to text (including handling types of image we didn’t train with)
And even image to image!
Doing this will require training a model using the whole imagenet competition dataset, which is a bigger dataset than we’ve used before. So we’re going to look at some techniques that make this faster and easier than you might expect.
We’re going to close our studies into generative models by looking at generative adversarial networks (GANs), a tool which has been rapidly gaining in popularity in recent months, and which may have the potential to create entirely new application areas for deep learning. We will be using them to create entirely new images from scratch.
Lesson 11: Memory Networks
We’ve covered a lot of different architectures, training algorithms, and all kinds of other CNN tricks during this course—so you might be wondering: what should I be using, when? The good news is that other folks have wondered that too, and have provided some great analyses of the pros and cons of various techniques in practice. We’ll be taking a look at a few highlights of these papers today.
Then we’re going to learn to GPU accelerate algorithms by taking advantage of Pytorch, which provides an interface that’s so similar to numpy that often you can move your algorithm onto the GPU in just an hour or two. In particular, we’re going to try to create the first (that we know of) GPU implementation of mean-shift clustering, a really useful algorithm that deserves to be more widely known.
To close out the lesson we will implement the heavily publicized “Memory Networks” algorithm, and will answer the question: does it live up to the hype?
Lesson 12: Attentional Models
It turns out that Memory Networks provide much of the key foundations we need to understand something which have become one of the most important advances in the last year or two: Attentional Models. These models allow us to build systems that focus on the most important part of the input for the current task, and are critical, for instance, in creating translation systems (which we’ll cover in the next lesson).
Lesson 13: Neural Translation
One application of deep learning that has progressed perhaps more than any other in the last couple of years is Neural Machine Translation. In late 2016 it was implemented by Google in what the New York Times called The Great A.I. Awakening. There’s a lot of tricks needed to reach Google’s level of translation capability, so we’ll be doing a deep dive in this lesson to learn nearly all the tricks used by state of the art systems.
Next up, we’ll learn about Densenets, which in July 2017 were awarded the CVPR Best Paper award, and have been shown to provide state of the art results in computer vision, particularly with small datasets. They are very similar to resnets, but with one key difference: the branches in each section are combined through concatenation, rather than addition. This apparently minor change makes a big difference in how they learn. We’ll also be using this technique in the next lesson to create a state of the art system for image segmentation.
Lesson 14: Time Series & Segmentation
Deep learning has generally been associated with unstructured data such as images, language, and audio. However it turns out that the structured data found in the columns of a database table or spreadsheet, where the columns can each represent different types of information in different ways (e.g. sales in dollars, area as zip code, product id, etc), can also be used very effectively by a neural network. This is equally true if the data can be represented as a time series (i.e. the rows represent different times or time periods).
In particular, what we learnt in part 1 about embeddings can be used not just for collaborative filtering and word encodings, but also for arbitrary categorical variables representing products, places, channels, and so forth. This has been highlighted by the results of two Kaggle competitions that were won by teams using this approach. We will study both of these datasets and competition winning strategies in this lesson.
Finally, we’ll look at how the Densenet architecture we studied in the last lesson can be used for image segmentation - that is, exactly specifying the location of every object in an image. This is another type of generative model, as we learnt in lesson 9, so many of the basic ideas from there will be equally applicable here.
I am thrilled to release fast.ai’s newest free course, Computational Linear Algebra, including an online textbook and a series of videos, and covering applications (using Python) such as how to identify the foreground in a surveillance video, how to categorize documents, the algorithm powering Google’s search, how to reconstruct an image from a CT scan, and more.
Jeremy and I developed this material for a numerical linear algebra course we taught in the University of San Francisco’s Masters of Analytics program, and it is the first ever numerical linear algebra course, to our knowledge, to be completely centered around practical applications and to use cutting edge algorithms and tools, including PyTorch, Numba, and randomized SVD. It also covers foundational numerical linear algebra concepts such as floating point arithmetic, machine epsilon, singular value decomposition, eigen decomposition, and QR decomposition.
What is numerical linear algebra?
“What exactly is numerical linear algebra?” you may be wondering. It is all about getting computers to do matrix math with speed and with acceptable accuracy, and way more awesome than the somewhat dull name suggests. Who cares about how computers do matrix math? All of you, probably. Data science is largely about manipulating matrices since almost all data can be represented as a matrix: time-series, structured data, anything that fits in a spreadsheet or SQL database, images, and language (often represented as word embeddings).
A typical first linear algebra course focuses on how to solve matrix problems by hand, for instance, spending time using Gaussian Elimination with pencil and paper to solve a small system of equations manually. However, it turns out that the methods and concerns for solving larger matrix problems via a computer are often drastically different:
Speed: when you have big matrices, matrix computations can get very slow. There are different ways of addressing this:
Different algorithms: there may be a less intuitive way to solve the problem that still gets the right answer, and does it faster.
Vectorizing or parallelizing your code.
Locality: traditional runtime computations focus on Big O, the number of operations computed. However, for modern computing, moving data around in memory can be very time-consuming, and you need ways to minimize how much memory movement you do.
Accuracy: Computers represent numbers (which are continuous and infinite) in a discrete and finite way, which means that they have limited accuracy. Rounding errors can add up, particularly if you are iterating! Furthermore, some math problems are not very stable, meaning that if you vary the input a little, you get a vastly different output. This isn’t a problem with rounding or with the computer, but it can still have a big impact on your results.
Memory use: for matrices that have lots of zero entries, or that have a particular structure, there are more efficient ways to store these in memory.
Scalability: in many cases you are interested in working with more data than you have space in memory for.
This course also includes some very modern approaches that we don’t know of any other numerical linear algebra courses covering (and we have done a lot of researching of numerical linear algebra courses and syllabi), such as:
This approach is very different from how most math courses operate: typically, math courses first introduce all the separate components you will be using, and then you gradually build them into more complex structures. The problems with this are that students often lose motivation, don’t have a sense of the “big picture”, and don’t know which pieces they’ll even end up needing. We have been inspired by Harvard professor David Perkin’s baseball analogy. We don’t require kids to memorize all the rules of baseball and understand all the technical details before we let them have fun and play the game. Rather, they start playing with a just general sense of it, and then gradually learn more rules/details as time goes on. All that to say, don’t worry if you don’t understand everything at first! You’re not supposed to. We will start using some “black boxes” or matrix decompositions that haven’t yet been explained, and then we’ll dig into the lower level details later.
I love math (I even have a math PhD!), but when I’m trying to solve a practical problem, code is more useful than theory. Also, one of the tests of whether you truly understand something is if you can code it, so this course if much more code-centered than a typical numerical linear algebra course.
The primary resource for this course is the free online textbook of Jupyter Notebooks, available on Github. They are full of explanations, code samples, pictures, interesting links, and exercises for you to try. Anyone can view the notebooks online by clicking on the links in the readme Table of Contents. However, to really learn the material, you need to interactively run the code, which requires installing Anaconda on your computer (or an equivalent set up of the Python scientific libraries) and you will need to be able to clone or download the git repo.
Accompanying the notebooks is a playlist of lecture videos, available on YouTube. If you are ever confused by a lecture or it goes too quickly, check out the beginning of the next video, where I review concepts from the previous lecture, often explaining things from a new perspective or with different illustrations.
The algorithm behind Google’s PageRank, used to rank the relative importance of different web pages
Can’t I just use sci-kit learn?
Many (although certainly not all) of the algorithms covered in this course are already implemented in scientific Python libraries such as Numpy, Scipy, and Scikit Learn, so you may be wondering why it’s necessary to learn what’s going on underneath the hood. Knowing how these algorithms are implemented will allow you to better combine and utilize them, and will make it possible for you to customize them if needed. In at least one case, we show how to get a sizable speedup over sci-kit learn’s implementation of a method. Several of the topics we cover are areas of active research, and there is recent research that has not yet been added to existing libraries.