Our courses (all are free and have no ads):

Our software


The new fast.ai research datasets collection, on AWS Open Data

In machine learning and deep learning we can’t do anything without data. So the people that create datasets for us to train our models are the (often under-appreciated) heroes. Some of the most useful and important datasets are those that become important “academic baselines”; that is, datasets that are widely studied by researchers and used to compare algorithmic changes. Some of these become household names (at least, among households that train models!), such as MNIST, CIFAR 10, and Imagenet.

We all owe a debt of gratitude to those kind folks who have made datasets available for the research community. So fast.ai and the AWS Public Dataset Program have teamed up to try to give back a little: we’ve made some of the most important of these datasets available in a single place, using standard formats, on reliable and fast infrastructure. For a full list and links see the fast.ai datasets page.

fast.ai uses these datasets in the Deep Learning for Coders courses, because they provide great examples of the kind of data that students are likely to encounter, and the academic literature has many examples of model results using these datasets which students can compare their work to. If you use any of these datasets in your research, please show your gratitude by citing the original paper (we’ve provided the appropriate citation link below for each), and if you use them as part of a commercial or educational project, consider adding a note of thanks and a link to the dataset.

Dataset example: the French/English parallel corpus

One of the lessons that gets the most “wow” feedback from fast.ai students is when we study neural machine translation. It seems like magic when we can teach a model to translate from French to English, even if we can’t speak both languages ourselves!

But it’s not magic; the key is the wonderful dataset that we leverage in this lesson: the French/English parallel text corpus prepared back in 2009 by Professor Chris Callison-Burch of the University of Pennsylvania. This dataset contains over 20 million sentence pairs in French and English. He built the dataset in a really clever way: by crawling millions of Canadian web pages (which are often multi-lingual) and then using a set of simple heuristics to transform French URLs onto English URLs. The dataset is particularly important for researchers since it is used in the most important annual competition for benchmarking machine translation models.

Here’s some examples of the sentence pairs that our translation models can learn from:

Often considered the oldest science, it was born of our amazement at the sky and our need to question Astronomy is the science of space beyond Earth’s atmosphere. Souvent considérée comme la plus ancienne des sciences, elle découle de notre étonnement et de nos questionnements envers le ciel L’astronomie est la science qui étudie l’Univers au-delà de l’atmosphère terrestre.
The name is derived from the Greek root astron for star, and nomos for arrangement or law. Son nom vient du grec astron, qui veut dire étoile et nomos, qui veut dire loi.
Astronomy is concerned with celestial objects and phenomena – like stars, planets, comets and galaxies – as well as the large-scale properties of the Universe, also known as “The Big Picture”. Elle s’intéresse à des objets et des phénomènes tels que les étoiles, les planètes, les comètes, les galaxies et les propriétés de l’Univers à grande échelle.

So what’s Professor Callison-Burch doing now? When we reached out to him to check some details for his dataset, he told us he’s now preparing the University of Pennsylvania’s new AI course; and part of his preparation: watching the videos at course.fast.ai! It’s a small world indeed…

The dataset collection

The following categories are currently included in the collection:

The datasets are all stored in the same tgz format, and (where appropriate) the contents have been converted into standard formats, suitable for import into most machine learning and deep learning software. For examples of using the datasets to build practical deep learning models, keep an eye on the fast.ai blog where many tutorials will be posted soon.

fastai v1 for PyTorch: Fast and accurate neural nets using modern best practices

Note from Jeremy: Want to learn more? Listen to me discuss fastai with Sam Charrington in this in-depth interview.

Summary

Today fast.ai is releasing v1 of a new free open source library for deep learning, called fastai. The library sits on top of PyTorch v1 (released today in preview), and provides a single consistent API to the most important deep learning applications and data types. fast.ai’s recent research breakthroughs are embedded in the software, resulting in significantly improved accuracy and speed over other deep learning libraries, whilst requiring dramatically less code. You can download it today from conda, pip, or GitHub or use it on Google Cloud Platform. AWS support is coming soon.

About fast.ai

fast.ai’s mission is to make the power of state of the art deep learning available to anyone. In order to make that happen, we do three things:

  1. Research how to apply state of the art deep learning to practical problems quickly and reliably

  2. Build software to make state of the art deep learning as easy to use as possible, whilst remaining easy to customize for researchers wanting to explore hypotheses

  3. Teach courses so that as many people as possible can use the research results and software

You may well already be familiar with our courses. Hundreds of thousands of people have already taken our Practical Deep Learning for Coders course, and many alumni are now doing amazing work with their new skills, at organizations like Google Brain, OpenAI, and Github. (Many of them now actively contribute to our busy deep learning practitioner discussion forums, along with other members of the wider deep learning community.)

You may also have heard about some of our recent research breakthroughs (with help from our students and collaborators!), including breaking deep learning speed records and achieving a new state of the art in text classification.

The new fastai library

So that covers the research and teaching parts of the three listed areas—but what about software? Today we’re releasing v1.0 of our new fastai deep learning library, which has been under development for the last 18 months. fastai sits on top of PyTorch, which provides the foundation for our work. When we announced the initial development of fastai over one year ago, we described many of the advantages that PyTorch provides us. For instance, we talked about how we could “use all of the flexibility and capability of regular python code to build and train neural networks”, and “we were able to tackle a much wider range of problems”. The PyTorch team has been very supportive throughout fastai’s development, including contributing critical performance optimizations that have enabled key functionality in our software.

fastai is the first deep learning library to provide a single consistent interface to all the most commonly used deep learning applications for vision, text, tabular data, time series, and collaborative filtering. This is important for practitioners, because it means if you’ve learned to create practical computer vision models with fastai, then you can use the same approach to create natural language processing (NLP) models, or any of the other types of model we support.

Google Cloud Platform are making fastai v1 available to all their customers from today in an experimental Deep Learning image for Google Compute Engine, including ready-to-run notebooks and pre-installed datasets. To use it, simply head over to Deep Learning images page on Google Cloud Marketplace and setup configuration for your instance, set framework to PyTorch 1.0RC and click “deploy”. That’s it, you now have the VM with Jupyter Lab, PyTorch 1.0 and fastai on it! Read more about how you can use the images in this post from Google’s Viacheslav Kovalevskyi. And if you want to use fastai in a GPU-powered Jupyter Notebook, it’s now a single click away thanks to fastai support on Salamander, also released today.

Good news too from Bratin Saha, VP, Amazon Web Services: “To support fast.ai’s mission to make the power of deep learning available at scale, the fastai library will soon be available in the AWS Deep Learning AMIs and Amazon SageMaker”. And we’re very grateful for the enthusiasm from Microsoft’s AI CTO, Joseph Sirosh, who said “At Microsoft, we have an ambitious goal to make AI accessible and valuable to every organization. We are happy to see Fast.AI helping democratize deep learning at scale and leveraging the power of the cloud.”

Early users

Semantic code search at GitHub

fast.ai are enthusiastic users of Github’s collaboration tools, and many of the Github team work with fast.ai tools too - even the CEO of Github studies deep learning using our courses! Hamel Husain, a Senior Machine Learning Scientist at Github who has been studying deep learning through fast.ai for the last two years, says:

“The fast.ai course has been taken by data scientists and executives at Github alike ushering in a new age of data literacy at GitHub. It gave data scientists at GitHub the confidence to tackle state of the art problems in machine learning, which were previously believed to be only accessible to large companies or folks with PhDs.”

Husain and his colleague Ho-Hsiang Wu recently released a new experimental tool for semantic code search, which allows Github users to find useful code snippets using questions written in plain English. In a blog post announcing the tool, they describe how they switched from Google’s Tensorflow Hub to fastai, because it “gave us easy access to state of the art architectures such as AWD LSTMs, and techniques such as cyclical learning rates with random restarts”.

Screenshot from Github’s semantic code search tool
Screenshot from Github’s semantic code search tool

Husain has been using a pre-release version of the fastai library for the last 12 months. He told us:

“I choose fast.ai because of its modularity, high level apis that implemented state of the art techniques, and innovations that reduce the need for tons of compute but with the same performance characteristics. The semantic code search demo is only the tip of the iceberg, as folks in sales, marketing, fraud are currently leveraging the power of fastai to bring transformative change to their business areas.”

Music generation

One student that stood out in our last fast.ai deep learning course was Christine McLeavey Payne, who had already had a fascinating journey as an award-winning classical pianist with an SF Symphony chamber group, a high performance computing expert in the finance world, and a neuroscience and medical researcher at Stanford. Her journey has only gotten more interesting since, and today she is a Research Fellow at the famous OpenAI research lab. In her most recent OpenAI project, she used fastai to help her create Clara: A Neural Net Music Generator. Here is some of her generated chamber music. Christine says:

“The fastai library is an amazing resource. Even when I was just starting in deep learning, it was easy to get a fastai model up and running in only a few lines of code. At that point, I didn’t understand the state-of-the-art techniques happening under the hood, but still they worked, meaning my models trained faster, and reached significantly better accuracy.”

Christine has even created a human or computer quiz that you can try for yourself; see if you can figure which pieces were generated by her algorithm! Clara is closely based on work she did on language modeling for one of her fast.ai student projects, and leverages the fastai library’s support for recent advances in natural language processing. Christine told us:

“It’s only more recently that I appreciate just how important these details are, and how much work the fastai library saves me. It took me just under two weeks to get this music generation project up and getting great initial results. I’m certain that speed couldn’t have been possible without fastai.”

We think that Clara is a great example of the expressive power of deep learning—in this case, a model designed to generate and classify text has been used to generate music, with relatively few modifications. “I took a fastai Language Model almost exactly (very slight changes in sampling the generation) and experimented with ways to write out the music in either a “notewise” or “chordwise” encoding” she wrote on Twitter. The result was a crowd favorite, with Vanessa M Garcia, a Senior Researcher at IBM Watson, declaring it her top choice at OpenAI’s Demo Day.

Twitter comment about Christine's music generation demo
Twitter comment about Christine's music generation demo

fastai for art projects

Architect and Investor Miguel Pérez Michaus has been using a pre-release version of fastai to research a system for art experiments that he calls Style Reversion. This is definitely a case where a picture tells a thousand words, so rather than try to explain what it does, I’ll let you see for yourself:

Example of Style Reversion
Example of Style Reversion

Pérez Michaus says he likes designing with fastai because “I know that it can get me where [Google’s Tensorflow library] Keras can not, for example, whenever something ‘not standard’ has to be achieved”. As an early adopter, he’s seen the development of the library over the last 12 months:

“I was lucky enough to see alpha version of fastai evolving, and even back then its power and flexibility was evident. Additionally, it was fully usable for people like myself, with domain knowledge but no formal Computer Science background. And it only has gotten better. My quite humble intuition about the future of deep learning is that we will need a fine grained understanding of what is really goes on under the hood, and in that landscape I think fastai is going to shine.”

fastai for academic research

Entrepreneurs Piotr Czapla and Marcin Kardas are the co-founders of n-waves, a deep learning consulting company. They used fastai to develop a novel algorithm for text classification in Polish, based on ideas shown in fast.ai’s Cutting Edge Deep Learning for Coders course. Polish is challenging for NLP, since it is a morphologically rich language (e.g. number, gender, animacy, and case are all collapsed into a word’s suffix). The algorithm that Czapla and Kardas developed won first prize in the top NLP academic competition in Poland, and a paper based on this new research will be published soon. According to Czapla, the fastai library was critical to their success:

“I love that fastai works well for normal people that do not have hundreds of servers at their disposal. It supports quick development and prototyping, and has all the best deep learning practices incorporated into it.”

The course and community have also been important for them:

“fast.ai’s courses opened my eyes to deep learning, and helped me to think and develop intuitions around how deep learning really works. Most of the answers to my questions are already on the forum somewhere, just a search away. I love how the notes from the lectures are composed into Wiki topics, and that other students are creating transcriptions of the lessons so that they are easier to find.”

Example: Transfer learning in computer vision

fast.ai’s research is embedded in that fastai library, so you get the benefits of it automatically. Let’s take a look at an example of what that means…

Kaggle’s Dogs vs Cats competition has been a favorite part of our courses since the very start, and it represents an important class of problems: transfer learning of a pre-trained model. So we’ll take a look at how the fastai library goes on this task.

Before we built fastai, we did most of our research and teaching using Keras (with the Tensorflow backend), and we’re still big fans of it. Keras really led the way in showing how to make deep learning easier to use, and it’s been a big inspiration for us. Today, it is (for good reason) the most popular way to train neural networks. In this brief example we’ll compare Keras and fastai on what we think are the three most important metrics: amount of code required, accuracy, and speed.

Here’s all the code required to do 2-stage fine tuning with fastai - not only is there very little code to write, there’s very few parameters to set:

data = data_from_imagefolder(Path('data/dogscats'),
    ds_tfms=get_transforms(), tfms=imagenet_norm, size=224)
learn = ConvLearner(data, tvm.resnet34, metrics=accuracy)
learn.fit_one_cycle(6)
learn.unfreeze()
learn.fit_one_cycle(4, slice(1e-5,3e-4))

Let’s compare the two libraries on this task (we’ve tried to match our Keras implementation as closely as possible, although since Keras doesn’t support all the features that fastai provides, it’s not identical):

fastai resnet34* fastai resnet50 Keras
Lines of code (excluding imports) 5 5 31
Stage 1 error 0.70% 0.65% 2.05%
Stage 2 error 0.50% 0.50% 0.80%
Test time augmentation (TTA) error 0.30% 0.40% N/A*
Stage 1 time 4:56 9:30 8:30
Stage 2 time 6:44 12:48 17:38

* Keras does not provide resnet 34 or TTA

(It’s important to understand that these improved results over Keras in no way suggest that Keras isn’t an excellent piece of software. Quite the contrary! If you tried to complete this task with almost any other library, you would need to write far more code, and would be unlikely to see better speed or accuracy than Keras. That’s why we’re showing Keras in this comparison - because we’re admirers of it, and it’s the strongest benchmark we know of!)

fastai also show similarly strong performance for NLP. The state of the art text classification algorithm is ULMFit. Here’s the relative error of ULMFiT versus previous top ranked algorithms on the popular IMDb dataset, as shown in the ULMFiT paper:

Summary of text classification performance
Summary of text classification performance

fastai is currently the only library to provide this algorithm. Because the algorithm is built in to fastai, you can match the paper’s results with similar code to that shown above for Dogs vs Cats. Here’s how you train the language model for ULMFiT:

data = data_from_textcsv(LM_PATH, Tokenizer(), data_func=lm_data)
learn = RNNLearner.language_model(data, drop_mult=0.3,
    pretrained_fnames=['lstm_wt103', 'itos_wt103'])
learn.freeze()
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
learn.unfreeze()
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7), pct_start=0.25)

Under the hood - pytorch v1

A critical component of fastai is the extraordinary foundation provided by PyTorch, v1 (preview) of which is also being released today. fastai isn’t something that replaces and hides PyTorch’s API, but instead is designed to expand and enhance it. For instance, you can create new data augmentation methods by simply creating a function that does standard PyTorch tensor operations; here’s the entire definition of fastai’s jitter function:

def jitter(c, size, magnitude:uniform):
    return c.add_((torch.rand_like(c)-0.5)*magnitude*2)

As another example, fastai uses and extends PyTorch’s concise and expressive Dataset and DataLoader classes for accessing data. When we wanted to add support for image segmentation problems, it was as simple as defining this standard PyTorch Dataset class:

class MatchedFilesDataset(DatasetBase):
    def __init__(self, x:Collection[Path], y:Collection[Path]):
        assert len(x)==len(y)
        self.x,self.y = np.array(x),np.array(y)
    def __getitem__(self, i):
        return open_image(self.x[i]), open_mask(self.y[i])

This means that as practitioners want to dive deeper into their models, data, and training methods, they can take advantage of all the richness of the full PyTorch ecosystem. Thanks to PyTorch’s dynamic nature, programmers can easily debug their models using standard Python tools. In many areas of deep learning, PyTorch is the most common platform for researchers publishing their research; fastai makes it simple to test our these new approaches.

Under the hood - fastai

In the coming months we’ll be publishing academic papers and blog posts describing the key pieces of the fastai library, as well as releasing a new course that will walk students through how the library was developed from scratch. To give you a taste, we’ll touch on a couple of interesting pieces here, focussing on computer vision.

One thing we care a lot about is speed. That’s why we competed in Stanford’s DAWNBench competition for rapid and accurate model training, where (along with our collaborators) we have achieved first place in every category we entered. If you want to match our top single-machine CIFAR-10 result, it’s as simple as four lines of code:

tfms = ([pad(padding=4), crop(size=32, row_pct=(0,1), col_pct=(0,1)),
    flip_lr(p=0.5)], [])
data = data_from_imagefolder('data/cifar10', valid='test',
    ds_tfms=tfms, tfms=cifar_norm)
learn = Learner(data, wrn_22(), metrics=accuracy).to_fp16()
learn.fit_one_cycle(25, wd=0.4)

Much of the magic is buried underneath that to_fp16() method call. Behind the scenes, we’re following all of Nvidia’s recommendations for mixed precision training. No other library that we know of provides such an easy way to leverage Nvidia’s latest technology, which gives two to three times better performance compared to previous approaches.

Another thing we care a lot about is accuracy. We want your models to work well not just on your training data, but on new test data as well. Therefore, we’ve built an entirely new computer vision library from scratch that makes it easy to develop and use data augmentation methods, to improve your model’s performance on unseen data. The new library uses a new approach to minimize the number of lossy transformations that your data goes through. For instance, take a look at the three images below:

Example of fastai transforms
Example of fastai transforms

On the left is the original low resolution image from the CIFAR-10 dataset. In the middle is the result of zooming and rotating this image using standard deep learning augmentation libraries. On the right is the same zoom and rotation, using fastai v1. As you can see, with fastai the detail is kept much better; for instance, take a look at how the pilot’s window is much crisper in the right-hand image than the middle image. This change to how data augmentation is applied means that practitioners using fastai can use far more augmentation than users of other libraries, resulting in models that generalize better.

These data augmentations even work automatically with non-image data such as bounding boxes. For instance, here’s an example of how fastai’s works with an image detection dataset, automatically tracking each bounding box through all augmentations:

Transforming bounding box data
Transforming bounding box data

These kinds of thoughtful features can be found throughout the fastai library. Over the coming months we’ll be doing deep dives in to many of them, for those of you interested in the details of how fastai is implemented behind the scenes.

Thanks!

Many thanks to the PyTorch team. Without PyTorch, none of this would have been possible. Thanks also to Amazon Web Services, who sponsored fast.ai’s first Researcher in Residence, Sylvain Gugger, who has contributed much of the development of fastai v1. Thanks also to fast.ai alumni Fred Monroe, Andrew Shaw, and Stas Bekman, who have all made significant contributions, to Yaroslav Bulatov, who was a key contributor to our most recent DAWNBench project, to Viacheslav Kovalevskyi, who handled Google Cloud integration, and of course to all the students and collaborators who have helped make the community and software successful.

Introduction to Machine Learning for Coders: Launch

Today we’re launching our newest (and biggest!) course, Introduction to Machine Learning for Coders. The course, recorded at the University of San Francisco as part of the Masters of Science in Data Science curriculum, covers the most important practical foundations for modern machine learning. There are 12 lessons, each of which is around two hours long—a list of all the lessons along with a screenshot from each is at the end of this post. They are all taught by me (Jeremy Howard); I’ve been studying and using machine learning for over 25 years, from when I started my career as an Analytical Specialist at McKinsey & Company, through to my time as President and Chief Scientist of Kaggle and founding CEO of Enlitic.

There are some excellent machine learning courses already, most notably the wonderful Coursera course from Andrew Ng. But that course is showing its age now, particularly since it uses Matlab for coursework. This new course uses modern tools and libraries, including python, pandas, scikit-learn, and pytorch. Unlike many educational materials in the field, our approach is “code first” rather than “math first”. It’s well suited to people who are writing code every day, but perhaps aren’t practicing their math chops quite as often (although we do cover all the necessary theory when appropriate). Perhaps most importantly, this course is very opinionated—rather than being a complete survey of every type of model, we focus on those that really matter in practice.

Two main types of models are covered: decision tree based models (particularly “forests” of bagged decision trees), and gradient descent based models (particularly logistic regression and its variants). Decision tree models build structures that look like this (in practice you’ll generally use much larger trees):

(The example above is from Professor Terence Parr and Prince Grover’s excellent discussion of tree visualization techniques, and uses his new animl visualization library. Terence and I are currently writing a book based on the material from this course, and a preview is available of the first chapters. So if you’re more of a book learner than a video learner, be sure to follow that!)

Decision tree methods are extremely flexible and easy to use, and when ensembled (using bagging or boosting) are the state of the art on many practical tasks. However, they can struggle with extrapolating to data outside of that they’re trained on, and are not very accurate for data types such as images, audio, and natural language. These problems are often best solved with gradient descent methods, and we’ll look at some of the most important of these in the second half of the course, closing with a simple deep learning neural network. (If you’ve already taken our Practical Deep Learning for Coders course, you’ll have a small amount of conceptual overlap here, but taught in a very different way.)

You’ll learn how to create a complete decision tree forest implementation from scratch, and write your your deep learning model and train it from scratch. Along the way, you’ll learn many important skills in data preparation, model testing, and product development (including ethical issues specific to data products).

Here’s a quick overview of each lesson, along with an example screenshot (you’ll find the same details on the course site):

Lesson 1 - Introduction to Random Forests

Lesson 1 will show you how to create a “random forest™” - perhaps the most widely applicable machine learning model - to create a solution to the “Bull Book for Bulldozers” Kaggle competition, which will get you in to the top 25% on the leader-board. You’ll learn how to use a Jupyter Notebook to build and analyze models, how to download data, and other basic skills you need to get started with machine learning in practice.

Lesson 2 - Random Forest Deep Dive

Today we start by learning about metrics, loss functions, and (perhaps the most important machine learning concept) over-fitting. We discuss using validation and test sets to help us measure over-fitting.

Then we’ll learn how random forests work - first, by looking at the individual trees that make them up, then by learning about “bagging”, the simple trick that lets a random forest be much more accurate than any individual tree. Next up, we look at some helpful tricks that random forests support for making them faster, and more accurate.

Lesson 3 - Performance, Validation and Model Interpretation

Today we’ll see how to read a much larger dataset - one which may not even fit in the RAM on your machine! And we’ll also learn how to create a random forest for that dataset. We also discuss the software engineering concept of “profiling”, to learn how to speed up our code if it’s not fast enough - especially useful for these big datasets.

Next, we do a deeper dive in to validation sets, and discuss what makes a good validation set, and we use that discussion to pick a validation set for this new data.

In the second half of this lesson, we look at “model interpretation” - the critically important skill of using your model to better understand your data. Today’s focus for interpretation is the “feature importance plot”, which is perhaps the most useful model interpretation technique.

Lesson 4 - Feature Importance, Tree Interpreter

Today we do a deep dive into feature importance, including ways to make your importance plots more informative, how to use them to prune your feature space, and the use of a “dendrogram” to understand feature relationships.

In the second half of the lesson we’ll learn about two more really important interpretation techniques: partial dependence plots, and the “tree interpreter”.

Lesson 5 - Extrapolation and RF from Scratch

In today’s lesson we start by learning more about the “tree interpreter”, including the use of “waterfall charts” to analyze their output. Next up, we look into the subtle but important issue of extrapolation. This is the weak point of random forests - they can’t predict values outside the range of the input data. We study ways to identify when this problem happens, and how to deal with it.

In the second half of this lesson, we start writing our very own random forest from scratch!

Lesson 6 - Data Products

In the first half of today’s lesson we’ll learn about how to create “data products” using machine learning models, based on “The Drivetrain Method”, and in particular how model interpretation is an important part of this approach.

Next up, we’ll explore the issue of extrapolation more deeply, using a Live Coding approach - we’ll also take this opportunity to learn a couple of handy numpy tricks.

Lesson 7 - Random Forest from Scratch

Today we’ll finish off our “from scratch” random forest interpretation! We’ll also briefly look at the amazing “cython” library that you can use to get the same speed as C code with minimal changes to your python code.

Then we’ll start on the next stage of our journey - gradient descent based methods such as logistic regression and neural networks…

Lesson 8 - Gradient Descent and Logistic Regression

Today we start the second half of the course - we’re moving from decision tree based approaches like random forests, to gradient descent based approaches like deep learning.

Our first step in this journey will be to use Pytorch to help us implement logistic regression from scratch. We’ll be building a model for the classic MNIST dataset of hand-written digits.

Lesson 9 - Regularization, Learning Rates and NLP

Today we continue building our logistic regression from scratch, and we add the most important feature to it: regularization. We’ll learn about L1 vs L2 regularization, and how they can be implemented. We also talk more about how learning rates work, and how to pick one for your problem.

In the second half of the lesson, we start our discussion of natural language processing (NLP). We’ll build a “bag of words” representation of the popular IMDb text dataset, using sparse matrices to ensure good performance and reasonable memory use. We’ll build a number of models from this, including naive bayes and logistic regression, and will improve these models by adding ngram features.

Lesson 10 - More NLP, and Columnar Data

In today’s lesson we’ll further develop our NLP model by combining the strengths of naive bayes and logistic regression together, creating the hybrid “NB-SVM” model, which is a very strong baseline for text classification. To do this, we’ll create a new nn.Module class in pytorch, and look at what it’s doing behind the scenes.

In the second half of the lesson we’ll start our study of tabular and relational data using deep learning, by looking at the “Rossmann” Kaggle competition dataset. Today, we’ll start down the feature engineering path on this interesting dataset. We’ll look at continuous vs categorical variables, and what kinds of feature engineering can be done for each, with a particular focus on using embedding matrices for categorical variables.

Lesson 11 - Embeddings

Today, after a review of the math behind naive bayes, we’ll do a deep dive into embeddings - both as used for categorical variables in tabular data, and as used for words in NLP.

Lesson 12 - Complete Rossmann, Ethical Issues

In the first half of today’s class we’ll put everything we’ve learned together to create a complete model for the Rossmann dataset, including both categorical and continuous features, and careful feature engineering for all columns.

In the second half of the class we’ll study some ethical issues that arise when implementing machine learning models. We’ll see why they should matter to practitioners, and ways of thinking about them. Many students have told us they found this the most important part of the course!

Note from Jeremy: I’ll be teaching Deep Learning for Coders at the University of San Francisco starting in October; if you’ve got at least a year of coding experience, you can apply here.

AI Ethics Resources

My newest Ask-A-Data-Scientist post was inspired by a computer science student who wrote in asking for advice on how to pursue a career in policy making related to the societal impacts of AI. I realized that there are many great resources out there, and I wanted to compile a list of links all in one place.

You can find my previous Ask-A-Data-Scientist advice columns here.

Everyone in tech should be concerned about the ethical implications of our work and actively engaging with such questions. The humanities and social sciences are incredibly relevant and important in addressing ethics questions. While tech ethics is not a new field (it has traditionally been studied within science, tech, & society (STS), or information science departments), many in the tech industry are now waking up to these questions, and there is a much wider interest in the topic than before.

Working on AI ethics takes many forms, including: founding tech companies and building products in ethical ways; advocating and working for more just laws and policies; attempting to hold bad actors accountable; and research, writing, and teaching in the field. I have included many links to further resources in the rest of this post, as well as a few concrete suggestions. Don’t be overwhelmed by the length of these lists! This post is intended to be a resource that you can refer back to as needed:

For an overview of some AI ethics issues, I encourage you to check out my recent PyBay keynote on the topic. Through a series of case studies, both negative and positive, I counter 4 misconceptions about tech that often lead to human harm, as well as offer some healthier principles:

Build up your technical skills

For anyone interested in the societal impact of AI, I recommend building up your technical knowledge of machine learning. Even if you do not plan on working as a programmer or deep learning practitioner, it is helpful to have a hands-on understanding of how this technology works and how it can be used. I encourage everyone interested in AI ethics and policy to learn Python and to take the Practical Deep Learning for Coders course (the only pre-requisite is one year of coding experience).

Start a reading group

Casey Fiesler, a professor in Information Science at CU Boulder, created a crowd-sourced spreadsheet of over 200 tech ethics courses and links to the syllabi for many of them. Even if your university does not offer a tech ethics course, I encourage you to start a club, reading group, or a student-led course on tech ethics, and these syllabi can be a helpful resource in creating your own.

For those who are not college students, consider starting a tech ethics reading group at your workplace (that could perhaps meet for lunch once a week and discuss a different reading each week) or a tech ethics meetup in your city.

10 AI Ethics Experts to Follow

Here are ten researchers whose work on AI ethics I admire and whom I recommend following. All of them have a number of great articles/talks/etc, although I’ve just linked to one each to get you started:

Institutes and Fellowships

The below institutes all offer a range of ways to get involved, including listening to their podcasts and videos (wherever you may be located in the world), attending in-person events, or applying for internships and fellowships to help fund your work in this area:

  • Harvard’s Berkman Klein Center for Internet & Society is a research center that seeks to bring people from around the globe together to tackle the biggest challenges presented by the Internet. Their programs include a Fellowship program, internships, and Assembly, a 4 month program for technologists, managers, and policymakers to confront emerging problems related to the ethics and governance of artificial intelligence.

  • Data & Society is a non-profit research institute founded by danah boyd in NYC. They have a year-long fellowship program which is open to data scientists and engineers, lawyers and librarians, ethnographers and creators, historians and activists.

  • AI Now Institute was founded by Kate Crawford and Meredith Whittaker, and is housed at NYU. They focus on four domains: rights and liberties, labor and automation, bias and inclusion, and safety and critical infrastructure.

  • Georgetown Law Center on Privacy and Technology is a think tank focused on privacy and surveillance law and policy—and the communities they affect. Their research includes The Perpetual Line-Up about the unregulated use of facial recognition technology by police in the USA.

  • Data for Democracy is a non-profit organization of volunteers that has worked on a variety of projects, including several collaborations with ProPublica.

  • Mozilla Media Fellowships fund new thinking on how to address emerging threats and challenges facing a healthy internet. Relevant projects have sought to address polarization, mass surveillance, and misinformation.

  • Knight Foundation (journalism focus) funds programs, including AI ethics initiative, to support free expression and journalistic excellence in the digital age. They have supported a number of projects related to addressing disinformation.

  • Eyebeam Residency (for artists) offers fellowships for those creating work which engages with technology and society through art. Previous projects include the open-source educational startup littleBits (2009) and the first Feminist Wikipedia Edit-A-Thon (2013).

Create your own

If what you want doesn’t yet exist in the world, you may need to create your own group, organization, non-profit, or startup. Timnit Gebru, a computer vision researcher, is an excellent role model for this. Dr. Gebru describes her experience as a Black woman attending NIPS (a major AI conference) in 2016, I went to NIPS and someone was saying there were an estimated 8,500 people. I counted six black people. I was literally panicking. That’s the only way I can describe how I felt. I saw that this field was growing exponentially, hitting the mainstream; it’s affecting every part of society. Dr. Gebru went on to found Black in AI, a large and active network of Black AI researchers, which has led to new research collaborations, conference and speaking invitations for members, and was even a factor in Google AI deciding to open a research center in Accra, Ghana.

Related fast.ai links

At fast.ai, we frequently write and speak about ethics, as well as including the topic in our deep learning course. Here are a few posts you may be interested in:

Here are some talks we’ve given on this topic:

The ethical impact of technology is a huge and relevant area, and there is a lot of work to be done.

What You Need to Know Before Considering a PhD

My newest Ask-A-Data-Scientist post addresses the question of whether to pursue a PhD. You can find my previous Ask-A-Data-Scientist advice columns here.

Question: I’m an undergrad student passionate about machine learning, and I feel a bit of pressure to get a PhD. Would it maybe make more sense to go into industry for a couple years and then consider going back to school? Any advice you have would be greatly appreciated.

Conversations around whether or not to do a PhD often suffer from selection bias: people considering PhDs ask successful people with PhDs for their advice. On the other side, there are many people doing fascinating and cutting-edge work without PhDs, who are less likely to be asked for advice on the topic. Other important factors, such as the disproportionately high rate of depression amongst graduate students or the opportunity cost of doing a PhD, are rarely discussed. As someone with a math PhD, I regret spending so many years over-focusing on a narrow area, while neglecting many other important skills. Once I joined the workforce, I felt like I was playing catch-up on many crucial skills and experiences!

Understanding Opportunity Costs

I grossly underestimated how much I could learn by working in industry. I believed the falsehood that the best way to always keep learning is to stay in academia, and I didn’t have a good grasp on the opportunity costs of doing a PhD. My undergraduate experience had been magical, and I had always both excelled at and enjoyed being in school. The idea of getting paid to be in school sounded like a sweet deal!

As I wrote about here, I later realized that my traditional academic success was actually a weakness, as I’d learned how to solve problems I was given, but not how to how to find and scope interesting problems on my own. I think for many top students (my former self included), getting a PhD feels like a “safe” option: it’s a well-defined path to doing something considered prestigious. But this can just be a way of postponing many necessary personal milestones: of learning to define and set your own goals apart from a structured academic system and of connecting more deeply with your own intrinsic motivations and values.

At the time, I felt like I was learning a lot during my PhD: taking advanced courses, reading papers, conducting research, regularly giving presentations, organizing two conferences in my field, coordinating a student-run graduate course, serving as an elected representative for grad students in my department, and writing a thesis. In hindsight, all of these were part of a narrower range of skills than I realized, and many of these skills were less transferable than I’d hoped. For instance, academic writing is very different from the type of writing I do through my blogging (which reaches a much wider audience!), and understanding academic politics was very different from startup politics, since the structure and incentives are so different.

Should you do a PhD? photo from #WOCinTech Chat
Should you do a PhD? photo from #WOCinTech Chat

I finished my PhD and started my first full-time adult job around the time I turned 27 (Note: I was earning a stipend through various research and teaching fellowships in graduate school, but that was different.) I had a lot to learn about working in industry and major gaps in my practical skills. Despite taking 2 years of C++ in high school, minoring in CS in college, and doing a few programming projects during my math PhD, I had focused on the more theoretical parts of computer science and was lacking in many practical computer skills. In contrast, my fast.ai co-founder Jeremy Howard started his first full-time adult job at 18 as a McKinsey consultant, and by the same age when I was first entering the workforce, Jeremy had been working full-time for nearly a decade and had founded two start-ups that are still operational today. I could have learned so many other things working in tech during the time I instead did my PhD.

To be clear, life is not a race. You can switch into tech and learn new skills at any age. The tech industry is deeply ageist, and the glorification of young founders is a harmful myth. However, I am never again going to have the energy I did in my early 20s (I eat healthy, lift heavy weights, and prioritize sleep, but I don’t feel the same), and I regret spending that time and energy being miserable while over-focusing on a narrow subject area and neglecting a lot of other skills.

You don’t need a PhD

Just off the top of my head, I thought of the following people who don’t have PhDs and who are doing interesting, cutting-edge work in deep learning (this list is incomplete and there are tons of others):

  • Chris Olah, co-editor of distill.pub, creator of insightful visualizations, researcher at Google Brain (no college degree)
  • Jeremy Howard, co-founder of fast.ai, founder of Enlitic (1st start-up to apply deep learning to medicine), previous #1-ranked Kaggler and Kaggle president, founder of fastmail and Optimal Decisions Group
  • David Ha, creator of Sketch-RNN doodles, researcher at Google Brain
  • Smerity, previous Salesforce/MetaMind researcher, inventor of AWD-LSTM, startup founder
  • Pete Warden, research engineer at Google Brain and tech lead for TensorFlow mobile, founder of JetPac (acquired by Google), author of O’Reilly ebook “Building Mobile Applications with TensorFlow”
  • Greg Brockman, CTO and co-founder of OpenAI, leads their DOTA efforts (no college degree)
  • Catherine Olsson, research engineer at Google Brain, formerly helped build OpenAI Gym
  • Sara Hooker, Google Brain researcher working on interpretability and model compression, founder of data for good non-profit Delta Analytics
  • Denny Britz, previously a Google Brain resident and worked on Spark at Berkeley, blogs at WildML
  • Helena Sarin, deep learning researcher creating innovative artwork
  • Sylvain Gugger, fast.ai’s first research fellow, has done research on AdamW and super-convergence
  • Mariya Yao, CTO of Metamaven, chief editor of TOPBOTS, author of Applied Artificial Intelligence, part of the Duke team that took 2nd place in the DARPA grand challenge
  • Devaki Raj, CEO and co-founder of startup CrowdAI applying AI to satellite imagery, previously worked on maps and Android at Google
  • Choong Ng, CEO and co-founder of Vertex.ai (acquired by Intel), which created PlaidML for fast and easy deploy of deep learning on any device
  • Brian Brackeen, founder and CEO of Kairos computer vision start-up, took admirable stance against use of facial recognition by law enforcement

In all the jobs I’ve had, including a couple that technically “required” a PhD, I had teammates without graduate degrees. My teammates without PhDs were often more productive and helpful then those of us with PhDs (perhaps because they had more practical experience).

Of course, there are plenty of people with PhDs who do fascinating and valuable work, such as Arvind Narayanan, Latanya Sweeney, Timnit Gebru, Moustapha Cisse, Yann Dauphin, Shakir Mohamed, Leslie Smith, Erin LeDell, Andrea Frome, and others. I deeply admire everyone I’ve listed, and I am not arguing that a PhD is never useful or never works out well.

Depression, Isolation, & Mental Health Problems among Grad Students

67 percent of graduate students said they had felt hopeless at least once in the last year; 54 percent felt so depressed they had a hard time functioning; and nearly 10 percent said they had considered suicide, a 2004 survey found. By comparison, an estimated 9.5 percent of American adults suffer from depressive disorders in a given year, according to the National Institute of Mental Health, according to research on UC Berkeley students.

photo from #WOCinTech Chat
photo from #WOCinTech Chat

Grad school is not all fun and personal enrichment for many people. It can involve poverty-level wages, uncertain employment conditions, contradictory demands by supervisors, irrelevant research projects, and disrespectful treatment by both the tenured faculty members and the undergraduates (both of whom behave, all too often, as management and customers.) Grad school is a confidence-killing daily assault of petty degradations. All of this is compounded by the fear that it is all for nothing; that you are a useful fool, one professor wrote in the Chronicle of Higher Education, in an article that was about humanities students in particular, yet applies to many STEM students as well. I hardly know anyone who was a grad student in the last decade who is not deeply embittered. Because of my columns on this site, a few people have told me how their graduate-school years coincided with long periods of suicidal ideation. More commonly, grad students suffer from untreated chronic ailments such as weight fluctuation, fatigue, headache, stomach pain, nervousness, and alcoholism.

While sexism and harassment contributed to my own negative experience in graduate school, many of my male classmates were miserable as well, due to isolation, bullying, or humiliating treatment from professors, and an exploitative system dominated by egos, rigid hierarchy, and obsession with prestige. One of the authors of a comprehensive report from the National Academy of Sciences stated, “Scientists have equated rigor and being critical with being cruel.”

Sexism and Racism in Academia

In science, engineering, & medicine, between 20%-50% of female students and more than 50% of women faculty have experienced harassment, according to a National Academy of Sciences report. In interviews with 60 women of Color who work in STEM research, 100% of them had experienced discrimination, and the particular negative stereotypes they faced differed depending on their race.

Credentials can be more important for people from underrepresented groups, who frequently face a higher level of scrutiny due to unconscious bias (particularly if they are self-taught). While underrepresented minorities may need the credentials more, unfortunately, due to the sexism and racism in higher education, they also may face worse environments in trying to obtain those credentials. I don’t have an answer for this, but wanted to note the tension.

Piper Harron, a Black woman who earned her PhD in math at Princeton, wrote a passage in her thesis, Respected research math is dominated by men of a certain attitude. Even allowing for individual variation, there is still a tendency towards an oppressive atmosphere, which is carefully maintained and even championed by those who find it conducive to success. As any good grad student would do, I tried to fit in, mathematically. I absorbed the atmosphere and took attitudes to heart. I was miserable, and on the verge of failure. The problem was not individuals, but a system of self-preservation that, from the outside, feels like a long string of betrayals, some big, some small, perpetrated by your only support system.

Toxic Graduate School is Worse than Other Toxic Jobs

I consider my time in graduate school as one of the two most toxic environments I’ve been in. While most of the advice I gave for coping with toxic jobs applies to toxic graduate school as well, there is one key distinction: it is much, much harder to switch graduate programs than it is to switch jobs. This makes the power difference between student and professor much greater than the power difference between an employee and boss in the tech industry (which thus means there is greater potential for abuse or exploitation).

I know people who have switched advisors or even switched programs, and yes, this can set you back years. However, the costs (in terms of mental and physical health, as well as opportunity costs) of staying in a toxic program is very high, and I know people who have spent years recovering from graduate school. It becomes even more complex if you are an immigrant on a student visa and have to consider visa/residency issues. There is not an easy solution for toxic graduate school situations.

Higher Education is Changing

The only situation where you definitely need a PhD is to become a professor. However, higher education is changing a lot: the shift to more adjuncts, the overproduction of PhDs, severe budget cuts to research funding in the USA, an increasing number of schools laying off tenured faculty, having to make repeated major moves for a series of post-docs, and unsustainable levels of student loan debt amongst undergraduates. I’m not sure what the future holds for higher education, but I think it will be different than the past (and this played a significant role in my own change of career goals).

I feel skeptical now when I hear undergraduates (including my younger self) say that they are certain they want to become professors, as it can be hard coming straight from undergraduate to understand the huge breadth and depth of career options that are out there, even if they have had a few internships or part-time jobs. Also, at that point, many students have primarily been surrounded by professors and students.

Coding bootcamps and MOOCs such as Coursera were not invented until I was well into my transition into tech, but both can be useful and are having a big impact on education. I’ve taken and benefitted from a number of online courses, and I would have benefitted from a coding bootcamp if they’d existed 10 or 15 years ago. In the past few years, I’ve worked both as an instructor for an in-person bootcamp and been a co-founder in building fast.ai’s MOOCs, which include Practical Deep Learning for Coders and Computational Linear Algebra. I’ve seen how powerful and useful these new educational formats can be when done well (there are also plenty of useless or sketchy bootcamps and MOOCs out there as well, so do your research).

Further Reading/Watching

You may be interested in some of my previous posts and talks on related topics:

When considering a PhD, it is important to carefully weigh the opportunity costs and risks, as well as to consider the experiences of a variety of people: those that have found success without PhDs, the many who have had negative graduate school experiences, and those that have succeeded following a traditional academic path.