Our online courses (all are free and have no ads):

Our software: fastai v1 for PyTorch

Take our course in person, March-April 2019 in SF: Register here

fast.ai in the news:

fastec2 script: Running and monitoring long-running tasks

This is part 2 of a series on fastec2. For an introduction to fastec2, see part 1.

Spot instances are particularly good for long-running tasks, since you can save a lot of money, and you can use more expensive instance types just for the period you’re actually doing heavy computation. fastec2 has some features to make this use case much more convenient. Let’s see an example. Here’s what we’ll be doing:

  1. Use an inexpensive on-demand monitoring instance for collecting results (and optionally for launching the task). We’ll call this od1 in this guide (but you can call it anything you like)
  2. Create a script to do the work required, and put any configuration files it needs in a specific folder. The script will need to be written to save results to a specific folder so they’ll be saved
  3. Test the script works OK in a fresh instance
  4. Run the script under fastec2, which will cause it to be launched inside a tmux session on a new instance, with the required files copied over, and any results copied back to od1 as they’re created
  5. While the script is running, check its progress either by connecting to the tmux session it’s running in, or looking at the results being copied back to od1 as it runs
  6. When done, the instance will be terminated automatically, and we’ll review the results on od1.

Let’s look at the details of how this works, and how to use it. Later in this post, we’ll also see how to use fastec2’s volumes and snapshots functionality to make it easier to connect to large datasets.

Setting up your monitoring instance and script

First, create a script that completes the task you need. When running under fastec2, the script will be launched inside a directory called ~/fastec2, and this directory will also contain any extra files (that aren’t already in your AMI) needed for the script, and will be monitored for changes which are copied back to your on-demand instance (od1, in this guide). Here’s a example (we’ll call it myscript.sh) we can use for testing:

#!/usr/bin/env bash
echo starting >> $FE2_DIR/myscript.log
sleep 60
echo done >> $FE2_DIR/myscript.log

When running, the environment variable FE2_DIR will be set to the directory your script and files are in. Remember to give your script executable permissions:

$ chmod u+x myscript.sh

When testing it on a fresh instance, just set FE2_DIR and create that directory, then see if your script runs OK (it’s a good idea to have some parameter to your script that causes it to run a quick version for testing).

$ export FE2_DIR=~/fastec2/spot2
$ mkdir -p $FE2_DIR
$ ./myscript.sh

Running the script with fastec2

You need some computer running that can be used to collect the results of the long running script. You won’t want to use a spot instance for this, since it can be shut down at any time, causing you to lose your work. But it can be a cheap instance type; if you’ve had your AWS account for less than 1 year then you can use a t2.micro instance for free. Otherwise a t3.micro is a good choice—it should cost you around US$7/month (plus storage costs) if you leave it running.

To run your script under fastec2, you need to provide the following information:

  1. The name of the instance to use (first create it with launch)
  2. The name of your script
  3. Additional arguments ([--myip MYIP] [--user USER] [--keyfile KEYFILE]) to connect to the monitoring instance to copy results to. If no host is provided, it uses the IP of the computer where fe2 is running.

E.g. this command will run myscript.sh on spot2 and copy results back to

$ fe2 launch spot2 base 80 m5.large --spot
$ fe2 script myscript.sh spot2

Here’s what happens after you run the fe2 script line above:

  1. A directory called ~/fastec2/spot2 is created on the monitoring instance if it doesn’t already exist (it is always a subdirectory of ~/fastec2 and is given the same name as the instance you’re connecting to, which in this case is spot2)
  2. Your script is copied to this directory
  3. This directory is copied to the target instance (in this case, spot2)
  4. A file called ~/fastec2/current is created on the target instance, containing the name of this task (“spot2 in this case”)
  5. lsyncd is run in the background on the target instance, which will continually copy any new/changed files from ~/fastec2/spot2 on the target instance, to the monitoring instance
  6. ~/fastec2/spot2/myscript.sh is run inside the tmux session

If you want the instance to terminate after the script completes, remember to include systemctl poweroff (for Ubuntu) or similar at the end of your script.

Creating a data volume

One issue with the above process is that if you have a bunch of different large datasets to work with, you either need to copy all of them to each AMI you want to use (which is expensive, and means recreating that AMI every time you add a dataset), or creating a new AMI for each dataset (which means as you change your configuration or add applications, that you have to change all your AMIs).

An easier approach is to put your datasets on to a separate volume (that is, an AWS disk). fastec2 makes it easy to create a volume (formatted with ext4, which is the most common type of filesystem on Linux). To do so, it’s easiest to use the fastec2 REPL (see the last section of part 1 of this series for an introduction to the REPL), since we need an ssh object which can connect to an instance to mount and format our new volume. For instance, to create a volume using instance od1 (assuming it’s already running):

$ fe2 i
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: inst = e.get_instance('od1')

In [2]: ssh = e.ssh(inst)

In [3]: vol = e.create_volume(ssh, 20)

In [4]: vol
Out[4]: od1 (vol-0bf4a7b9a02d6f942 in-use): 20GB

In [5]: print(ssh.run('ls -l /mnt/fe2_disk'))
total 20
-rw-rw-r-- 1 ubuntu ubuntu     2 Feb 20 14:36 chk
drwx------ 2 ubuntu root   16384 Feb 20 14:36 lost+found

As you see, the new disk has been mounted on the requested instance under the directory /mnt/fe2_disk, and the new volume has been given the same name (od1) as the instance it was created with. You can now connect to your instance and copy your datasets to this directory, and when you’re done, unmount the volume (sudo umount /mnt/fe2_disk in your ssh session), and then you can detach the volume with fastec2. If you do’nt have your previous REPL session open any more, you’ll need to get your volume object first, then you can detach it.

In [1]: vol = e.get_volume('od1')

In [2]: vol
Out[2]: od1 (vol-0bf4a7b9a02d6f942 in-use): 20GB

In [3]: e.detach_volume(vol)

In [4]: vol
Out[4]: od1 (vol-0bf4a7b9a02d6f942 available): 20GB

In the future, you can re-mount your volume through the repl:

In [5]: e.mount_volume(ssh, vol)

Using snapshots

A significant downside of volumes is that you can only attach a volume to one instance at a time. That means you can’t use volumes to launch lots of tasks all connected to the same dataset. Instead, for this purpose you should create a snapshot. A snapshot is a template for a volume; any volumes created from this snapshot will have the same data that the original volume did. Note however that snapshots are not updated with any additional information added to volumes—the data originally included in the snapshot remains without any changes.

To create a snapshot from a volume (assuming you already have a volume object vol, as above, and you’ve detached it from the instance):

In [7]: snap = e.create_snapshot(vol, name="snap1")

You can now create a volume using this snapshot, which attaches to your instance automatically:

In [8]: vol = e.create_volume(ssh, name="vol1", snapshot="snap1")


Now we’ve got all the pieces of the puzzle. In a future post we’ll discuss best practices for running tasks using fastec2 using all these pieces—but here’s the quick summary of the process:

  1. Launch an instance and set it up with the software and configuration you’ll need
  2. Create a volume for your datasets if required, and make a snapshot from it
  3. Stop that instance, and create an AMI from it (optionally you can terminate the instance after that is done)
  4. Launch a monitoring instance using an inexpensive instance type
  5. Launch a spot instance for your long-running task
  6. Create a volume from your snapshot, attached to your spot instance
  7. Run your long running task on that instance, passing the IP of your monitoring instance
  8. Ensure that your long running task shuts down the instance when done, to avoid paying for the instance after complete. (You may also want to delete the volume created from the snapshot at that time.)

To run additional tasks, you only need to repeat the last 4 steps. You can automate that process using the API calls shown in this guide.

fastec2: AWS computer management for regular folks

This is part 1 of a series on fastec2. To learn how to run and monitor long-running tasks with fastec2 check out part 2.

AWS EC2 is a wonderful system; it allows anyone to rent a computer for a few cents an hour, including a fast network connection and plenty of disk space. I’m particularly grateful to AWS, because thanks to their Activate program we’ve got lots of compute credits to use for our research and development at fast.ai.

But if you’ve spent any time working with AWS EC2, then for setting it up you’ve probably found yourself stuck between the slow and complex AWS Console GUI, and the verbose and clunky command line interface (CLI). There are various tools available to streamline AWS management, but they tend towards the power user end of the spectrum, written for people that are deploying dozens of computers in complex architectures.

Where’s the tool for regular folks? Folks who just want to launch a computer or two for getting some work done, and shutting it down when it’s finished? Folks who aren’t really that keen to learn a whole bunch of AWS-specific jargon about VPCs and Security Groups and IAM Roles and oh god please just make it go away…

The delights of the AWS Console
The delights of the AWS Console


  1. Overview
  2. Installation and configuration
  3. Creating your initial on-demand instance
  4. Creating your Amazon Machine Instance (AMI)
  5. Launching and connecting to your instance
  6. Launching a spot instance
  7. Using the interactive REPL and ssh API

Since I’m an extremely regular folk myself, I figured I better go write that tool. So here it is: fastec2. Is it for you? Here’s a summary of what it is designed to make easy (‘instance’ here simply means ‘AWS computer’):

  • Launch a new on-demand or spot instance
  • See what instances are running
  • Start an instance
  • Connect to a named instance using ssh
  • Run a long-running script in a spot instance and monitor and save results
  • Create and use volumes and snapshots, including automatic formatting/mounting
  • Change the type of an instance (e.g. add or remove a GPU)
  • See pricing for on-demand and spot instance types
  • Access through either a standard command line or through a Jupyter Notebook API
  • Tab completion
  • IPython command line interactive REPL available for further exploration

I expect that this will be most useful to people who are doing data analysis, data collection, and machine learning model training. Note that fastec2 is not designed to make it easy to manage huge fleets of servers on set up complex network architectures, or to help with deployment of applications. If you’re wanting to do that, you might want to check out Terraform or CloudFormation.

To see how it works, let’s do a complete walkthru of creating a new Amazon Machine Image (AMI), then lauching an AMI from this instance, and connecting to it. We’ll also see how to launch a spot instance, running a long-running script on it, and collect the results of the script. I’m assuming you already have an AWS account, and know the basics of connecting to instances with ssh. (If you’re not sure about this bit, first you should follow this tutorial on DataCamp.) Note that much of the coolest functionality in fastec2 is being provided by the wonderful Fire, Paramiko, and boto3 libraries—so a big thanks to all the wonderful people that made these available!


The main use case that we’re looking to support with fastec2 is as follows: you want to interactively start and stop machines of various types, each time getting the same programs, data, and configuration automatically. Sometimes you’ll create an on-demand image and start and stop it as required. You may also want to change the instance type occassionally, such as adding a GPU, or increasing the RAM. (This can be done instantly with a single command!) Sometimes you’ll fire up a spot instance in order to run a script and save the results (such as for training a machine learning model, or completing a web scraping task).

The key to having this work well is to set up an AMI which is set up just as you need it. You may think of an AMI as being something that only sysadmin geniuses at Amazon build for you, but as you’ll see it’s actually pretty quick and easy. By making it easy to create and use AMIs, you can then easily create the machines you need, when you need them.

Everything in fastec2 can also be done through the AWS Console, and through the official AWS CLI. Furthermore, there’s lots of things that fastec2 can’t do—it’s not meant to be complete, it’s meant to be convenient for the most commonly used functionality. But hopefully you’ll discover that for what it provides, it makes it easier and faster than anything else out there…

Installation and configuration

You’ll need python 3.6 or later - we highly recommend installing Anaconda if you’re not already using python 3.6. It lets you have as many different python versions as you want, and different environments, and switch between them as needed. To install fastec2:

pip install git+https://github.com/fastai/fastec2.git

You can also save some time by installing tab-completion for your shell. See the readme for setup steps for this. Once installed, hit Tab at any point to complete a command, or hit Tab again to see possible alternatives.

fastec2 uses a python interface to the AWS CLI to do its work, so you’ll need to configure this. The CLI uses region codes, instead of the region names you see in that console. To find out the region code for the region you wish to use, fastec2 can help. To run the fastec2 application type fe2, along with a command name and any required arguments. The command region will show the first code that matches the (case-sensitive) substring you provide, eg (note that I’m using ‘$’ to indicate the lines you type, and other lines are the responses):

$ fe2 region Ohio

Now that you have your region code, you can configure AWS CLI:

$ aws configure
AWS Access Key ID: XXX
AWS Secret Access Key: XXX
Default region name: us-east-2

For information on setting this up, including getting your access keys for AWS, see Configuring the AWS CLI.

Creating your initial on-demand instance

Life is much easier when you can rapidly create new instances which are all set up just how you like them, with the right software installed, data files downloaded, and configuration set up. You can do this by creating an AMI, which is simply a “frozen” version of a computer that you’ve set up, and can then recreate as many times as you like, nearly instantly.

Therefore, we will first set up an EC2 instance with whatever we’re going to need (we’ll call this your base instance). (You might already have an instance set up, in which case you can skip this step).

One thing that will make things a bit easier is if you ensure you have a key pair on AWS called “default”. If you don’t, go ahead and upload or create one with that name now. Although fastec2 will happily use other named keys if you wish, you’ll need to specify the key name every time if you don’t use “default”. You don’t need to make your base instance disk very big, since you can always use a larger size later when you launch new instances using your AMI. Generally 60GB is a reasonable size to choose.

To create our base image, we’ll need to start with some existing AMI that contains a Linux distribution. If you already have some preferred AMI that you use, feel free to use it; otherwise, we suggest using the latest stable Ubuntu image. To get the AMI id for the latest Ubuntu, type:

$ fe2 get-ami - id

This shows a powerful feature of fastec2: all commands that start with “get-” return an AWS object, on which you can call any method or property (each of these commands also has a version without the get- prefix, which prints a brief summary of the object instead of returning it). Type your method or property name after a hyphen, as shown above. In this case, we’re getting the ‘id’ property of the AMI object returned by get-ami (which defaults to the latest stable Ubuntu image; see below for examples of other AMIs). To see the list of properties and methods, simply call the command without a property or method added:

$ fe2 get-ami -

Usage:           fe2 get-ami
                 fe2 get-ami architecture
                 fe2 get-ami block-device-mappings
                 fe2 get-ami create-tags
                 fe2 get-ami creation-date

Now you can launch your instance—this creates a new “on-demand” Linux instance, and when complete (it’ll take a couple of minutes) it will print out the name, id, status, and IP address. The command will wait until ssh is accessible on your new instance before it returns:

$ fe2 launch base ami-0c55b159cbfafe1f0 50 m5.xlarge
base (i-00c7f2f81a841b525 running):

The fe2 launch command takes a minimum of 4 parameters: the name of the instance to create, the ami to use (either id or name—here we’re using the AMI id we retrieved earlier), the size of the disk to create (in GB), and the instance type. You can learn about the different instance types available from this AWS page. To see the pricing of different instances, you can use this command (replace m5 with whichever instance series you’re interested in; note that currently only US prices are displayed, and they may not be accurate or up to date—use the AWS web site for full price lists):

$ fe2 price-demand m5
["m5.large", 0.096]
["m5.metal", 4.608]
["m5.xlarge", 0.192]
["m5.2xlarge", 0.384]
["m5.4xlarge", 0.768]
["m5.12xlarge", 2.304]
["m5.24xlarge", 4.608]

With our instance running, we can now connect to it with ssh:

$ fe2 connect base
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1032-aws x86_64)

Last login: Fri Feb 15 22:10:28 2019 from

ubuntu@ip-172-31-13-138:~$ |

Now you can configure your base instance as required, so go ahead and apt install any software you want, copy over data files you’ll need, and so forth. In order to use some features of fastec2 (discussed below) you’ll need tmux and lsyncd installed in your AMI, so go ahead and install then now (sudo apt install -y tmux lsyncd). Also, if you’ll be using the long-running script functionality in fastec2 you’ll need a private key in your ~/.ssh directory which has permission to connect to another instance to save results of the script. So copy your regular private key over (if it’s not too sensitive) or create a new one (type: ssh-keygen) and grab the ~/.ssh/id_dsa.pub file it creates.

Check: make sure you’ve done the following in your instance before you make it into an AMI: installed lsyncd and tmux; copied over your private key.

If you want to connect to jupyter notebook, or any other service on your instance, you can use ssh tunneling. To create ssh tunnels, add an extra argument to the above fe2 connect command, passing in either a single int (one port) or an array (multiple ports), e.g.:

# Tunnel to just jupyter notebook (running on port 8888)
fe2 connect od1 8888
# Two tunnels: jupyter notebook, and a server running on port 8008
fe2 connect od1 [8888,8008]

This doesn’t do any fancy fowarding between different machines on the networks - it’s just a direct connection from the computer you run fe2 connect on, to your computer you’re ssh’ing to. So generally you’ll run this on your own PC, and then access (for Jupyter) http://localhost:8888 in your browser.

Creating your Amazon Machine Instance (AMI)

Once you’ve configured your base instance, you can create your own AMI:

$ fe2 freeze base

Here ‘freeze’ is the command, and ‘base’ is the argument. Replace myname with the name of your base instance that you wish to “freeze” into an AMI. Note that your instance will be rebooted during this process, so ensure that you’ve saved any open documents and it’s OK to shut down. It might take 15 mins or so for the process to complete (for very large disks of hundreds of GB it could take hours). To check on progress, either look in the AMIs section of the AWS console, or type this command (it will display ‘pending’ whilst it is still creating the image):

$ fe2 get-ami base - state

(As you’ll see, this is using the method-calling functionality of fastec2 that we saw earlier.)

Launching and connecting to your instance

Now you’ve gotten your AMI, you can launch a new instance using that template. It only take a couple of minutes for your new instance to be created, as follows:

$ fe2 launch inst1 base 80 m5.large
inst1 (i-0f5a3b544274c645f running):

We’re calling our new instance ‘inst1’, and using the ‘base’ AMI we created earlier. As you can see, the disk size and instance type need not be the same as you used when creating the AMI (although the disk size can’t be smaller than the size you created with). You can see all the options available for the launch command; we’ll see how to use the iops and spot parameters in the next section:

$ fe2 launch -- --help

       fe2 launch --name NAME --ami AMI --disksize DISKSIZE --instancetype INSTANCETYPE
         [--keyname KEYNAME] [--secgroupname SECGROUPNAME] [--iops IOPS] [--spot SPOT]

Congratulations, you’ve launched your first instance from your own AMI! You can repeat the previous fe2 launch command, just passing in a different name, to create more instances, and ssh to each with fe2 connect <name>. To shutdown an instance, enter in the terminal of your instance:

sudo shutdown -h now

…or alternatively enter in the terminal of your own computer (change inst1 to the name of your instance):

fe2 stop inst1

If you replace stop with terminate in the above command it will terminate your instance (i.e. it will destroy it, and by default will remove all of your data on the instance; when terminating the instance, fastec2 will also remove its name tag, so it’s immediately available to reuse). If you want to have fastec2 wait until the instance is stopped, use this command (otherwise it will happen automatically in the background):

$ fe2 get-instance inst1 - wait-until-stopped

Here’s a really handy feature: after you’ve stopped your instance, you can change it to a different type! This means that you can do your initial prototyping on a cheap instance type, and then run your big analysis on a super-fast machine when you’re ready.

$ fe2 change-type inst1 p3.8xlarge

Then you can re-start your instance and connect to it as before:

$ fe2 start inst1
inst1 (i-0f5a3b544274c645f running):

$ fe2 connect inst1
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1032-aws x86_64)

With all this playing around with instances you may get lost as to what you’ve created and what’s actually running! To find out, just use the instances command:

$ fe2 instances
spot1 (i-0b39947b710d05337 running):
inst1 (i-0f5a3b544274c645f stopped): No public IP
base (i-00c7f2f81a841b525 running):
od1 (i-0a1b47f88993b2bba stopped): No public IP

The instances with “No public IP” will automatically get a public IP when you start them. Generally you won’t need to worry about what the IP is, since you can fe2 connect using just the name; however you can always grab the IP through fastec2 if needed:

$ fe2 get-instance base - public-ip-address

Launching a spot instance

Spot instances can be 70% (or more) cheaper than on-demand instances. However, they may be shut down at any time, may not always be available, and all data on their root volume is deleted when they are shut down (in fact, they can only be terminated; they can’t be shut down and restarted later). Spot instance prices vary over time, by instance type, and by region. To see that last 3 days’ pricing for instances in a group (in this case, for p3 types), enter:

$ fe2 price-hist p3
Timestamp      2019-02-13  2019-02-14  2019-02-15
p3.2xlarge         1.1166      1.1384      1.1547
p3.8xlarge         3.9462      3.8884      3.8699
p3.16xlarge        7.3440      7.4300      8.0867
p3dn.24xlarge         NaN         NaN         NaN

Let’s compare to on-demand pricing:

$ fe2 price-demand p3
["p3.2xlarge", 3.06]
["p3.8xlarge", 12.24]
["p3.16xlarge", 24.48]
["p3dn.24xlarge", 31.212]

That’s looking pretty good! To get more detailed price graphs, check out the spot pricing tool on the AWS console, or else try using the fastec2 jupyter notebook API. This API is identical to the fe2 command, except that you create an instance of the EC2 class (optionally passing a region to the constructor), and call methods on that class. (If you haven’t used Jupyter Notebook before, then you should definitely check it out, because it’s amazingly great! Here’s a helpful tutorial from the kind folks at DataQuest to get you started.) The price-demand method has an extra feature when used in a notebook that prints the last few weeks prices in a graph for you (note that hyphens must be replaced with underscores in the notebook API).

Example of spot pricing in the notebook API
Example of spot pricing in the notebook API

To launch a spot instance, just add --spot to your launch command:

$ fe2 launch spot1 base 80 m5.large --spot
spot1 (i-0b39947b710d05337 running):

Note that this is only requesting a spot instance. It’s possible that no capacity will be available for your request. In that case, after a few minutes you’ll see an error from fastec2 telling you that the request failed. We can see that the above request was successful, because it’s printed out a message showing the new instance is “running”.

Remember: if you stop this spot instance it will be terminated and all data will be lost! And AWS can decide to shut it down at any time.

Using the interactive REPL and ssh API

How do you know what methods and properties are available? And how can you access them more conveniently? The answer is: use the interactive REPL! A picture tells a thousand words:…

The fastec2 REPL
The fastec2 REPL

If you add -- -i to the end of a command which returns an object (which is currently the instance, get-ami, and ssh commands) then you’ll be popped in to an IPython session with that object available in the special name result. So just type result. and hit Tab to see all the methods and properties available. This is a full python interpreter, so you can use the full power of python to interact with this object. When you’re done, hit Ctrl-d twice to exit.

One interesting use of this is to experiment with the ssh command, which provides an API to issue commands to the remote instance via ssh. The object returned by this command is a standard Paramiko SSHClient, with a couple of extra goodies. One of those goodies is send(cmd), which sends ‘cmd’ to a tmux session (that’s automatically started) on the instance. This is mainly designed for you to use from scripts, but you can experiment with it via the REPL, as shown below.

Communicating with remote tmux session via the REPL
Communicating with remote tmux session via the REPL

If you just want to explore the fastec2 API interactively, the easiest way is by launching the REPL using fe2 i (you can optionally append a region id or part of a region name). A fastec2.EC2 object called e will be automatically created for you. Type e. and hit Tab to see a list of options. IPython is started in smart autocall mode, which means that you often don’t even need to type parentheses to run methods. For instance:

$ fe2 i Ohio
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: e.instances
inst1 (i-0f5a3b544274c645f m5.large running):
base (i-00c7f2f81a841b525 m5.xlarge stopped): No public IP
od1 (i-0a1b47f88993b2bba t3.micro running):

In [2]: i=e.get_instance('od1')

In [3]: i.block_device_mappings
[{'DeviceName': '/dev/sda1',
  'Ebs': {'AttachTime': datetime.datetime(2019, 2, 14, 9, 30, 16),
   'DeleteOnTermination': True,
   'Status': 'attached',
   'VolumeId': 'vol-0d1b1a47539d5bcaf'}}]

fastec2 provides many convenient methods for managing AWS EC2, and also adds functionalty to make SSH and SFTP easier to use. We’ll look at these features of the fastec2 API in more detail in a future article.

If you want to learn how to run and monitor long-running tasks with fastec2 check out part 2 of this series, where we’ll also see how fastec2 helps to create and use volumes and snapshots, including automatic formatting/mounting.

Some thoughts on zero-day threats in AI, and OpenAI's GP2

There’s been a lot of discussion in the last couple of days about OpenAI’s new language model. OpenAI made the unusual decision to not release their trained model (the AI community is usually extremely open about sharing them). On the whole, the reaction has been one of both amazement and concern, and has been widely discussed in the media, such as this thoughtful and thorough coverage in The Verge. The reaction from the academic NLP community, on the other hand, has been largely (but not exclusively) negative, claiming that:

  1. This shouldn’t be covered in the media, because it’s nothing special
  2. OpenAI had no reason to keep the model to themselves, other than to try to generate media hype through claiming their model is so special it has to be kept secret.

On (1), whilst it’s true that there’s no real algorithmic leap being done here (the model is mainly a larger version of something that was published by the same team months ago), the academic “nothing to see here” reaction misses the point entirely. Whilst academic publishing is (at least in this field) largely driven by specific technical innovations, broader community interest is driven by societal impact, surprise, narrative, and other non-technical issues. Every layperson I’ve spoken to about this new work has reacted with stunned amazement. And there’s clearly a discussion to be had about potential societal impacts of a tool that may be able to scale up disinformation campaigns by orders of magnitude, especially in our current environment where such campaigns have damaged democracy even without access to such tools.

In addition, the history of technology has repeatedly shown that the hard thing is not, generally, solving a specific engineering problem, but showing that a problem can be solved. So showing what is possible is, perhaps, the most important step in technology development. I’ve been warning about potential misuse of pre-trained language models for a while, and even helped develop some of the approaches the people are using now to build this tech; but it’s not until OpenAI actually showed what can be done in practice that the broader community has woken up to some of the concerns.

But what about the second issue: should OpenAI release their pretrained model? This one seems much more complex. We’ve already heard from the “anti-model-release” view, since that’s what OpenAI has published and also discussed with the media. Catherine Olsson (who previously worked at OpenAI) asked on Twitter if anyone has yet seen a compelling explanation of the alternative view:

I’ve read a lot of the takes on this, and haven’t yet found one that really qualifies. A good-faith explanation would need to engage with what OpenAI’s researchers actually said, which takes a lot of work, since their team have written a lot of research on the societal implications of AI (both at OpenAI, and elsewhere). The most in-depth analysis of this topic is the paper The Malicious Use of Artificial Intelligence. The lead author of this paper now works at OpenAI, and was heavily involved in the decision around the model release. Let’s take a look at the recommendations of that paper:

  1. Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI
  2. Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuserelated considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.
  3. Best practices should be identified in research areas with more mature methods for addressing dual-use concerns, such as computer security, and imported where applicable to the case of AI.
  4. Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges.

An important point here is that an appropriate analysis of potential malicious use of AI requires a cross-functional team and deep understanding of history in related fields. I agree. So what follows is just my one little input to this discussion. I’m not ready to claim that I have the answer to the question “should OpenAI have released the model”. I will also try to focus on the “pro-release” side, since that’s the piece that hasn’t had much thoughtful input yet.

A case for releasing the model

OpenAI said that their release strategy is:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code.

So specifically we need to be discussing scale. Their claim is that a larger scale model may cause significant harm without time for the broader community to consider it. Interestingly, even they don’t claim to be confident of this concern:

This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.

Let’s get specific. How much scale are we actually talking about? I don’t see this explicitly mentioned in their paper of blog post, but we can make a reasonable guess. The new GPT2 model has (according to the paper) about ten times as many parameters as their previous GPT model. Their previous model took 8 GPUs 1 month to train. One would expect that they can train their model faster by now, since they’ve had plenty of time to improve their algorithms, but on the other hand, their new model probably takes more epochs to train. Let’s assume that these two balance out, so we’re left with the difference of 10x in parameters.

If you’re in a hurry and you want to get this done in a month, then you’re going to need 80 GPUs. You can grab a server with 8 GPUs from the AWS spot market for $7.34/hour. That’s around $5300 for a month. You’ll need ten of these servers, so that’s around $50k to train the model in a month. OpenAI have made their code available, and described how to create the necessary dataset, but in practice there’s still going to be plenty of trial and error, so in practice it might cost twice as much.

If you’re in less of a hurry, you could just buy 8 GPUs. With some careful memory handling (e.g. using Gradient checkpointing) you might be able to get away with buying RTX 2070 cards at $500 each, otherwise you’ll be wanting the RTX 2080 ti at $1300 each. So for 8 cards, that’s somewhere between $4k and $10k for the GPUs, plus probably another $10k or so for a box to put them in (with CPUs, HDDs, etc). So that’s around $20k to train the model in 10 months (again, you’ll need some extra time and money for the data collection, and some trial and error).

Most organizations doing AI already have 8 or more GPUs available, and can often get access to far more (e.g. AWS provides up to $100k credits to startups in its AWS Activate program, and Google provides dozens of TPUs to any research organization that qualifies for their research program).

So in practice, the decision not to release the model has a couple of outcomes:

  1. It’ll probably take at least a couple of months before another organization has successfully replicated it, so we have some breathing room to discuss what to do when this is more widely available
  2. Small organizations that can’t afford to spend $100k or so are not able to use this technology at the scale being demonstrated.

Point (1) seems like a good thing. If suddenly this tech is thrown out there for anyone to use without any warning, then no-one can be prepared at all. (In theory, people could have been prepared because those within the language modeling community have been warning of such a potential issue, but in practice people don’t tend to take it seriously until they can actually see it happening.) This is what happens, for instance, in the computer security community, where if you find a flaw the expectation is that you help the community prepare for it, and only then do you release full details (and perhaps an exploit). When this doesn’t happen, it’s called a zero day attack or exploit, and it can cause enormous damage.

I’m not sure I want to promote a norm that zero-day threats are OK in AI.

On the other hand, point (2) is a problem. The most serious threats are most likely to come from folks with resources to spend $100k or so on (for example) a disinformation campaign to attempt to change the outcome of a democratic election. In practice, the most likely exploit is (in my opinion) a foreign power spending that money to dramatically escalate existing disinformation campaigns, such as those that have been extensively documented by the US intelligence community.

The only practical defense against such an attack is (as far as I can tell) to use the same tools to both attempt to identify, and push back against, such disinformation. These kinds of defenses are likely to be much more powerful when wielded by the broader community of those impacted. The power of a large group of individuals has repeatedly been shown to be more powerful at creating, than at destruction, as we see in projects such as Wikipedia, or open source software.

In addition, if these tools aren’t in the hands of people without access to large compute resources, then they remain abstract and mysterious. What can they actually do? What are their constraints? For people to make informed decisions, they need to have a real understanding of these issues.


So, should OpenAI release their trained model? Frankly, I don’t know. There’s no question in my mind that they’ve demonstrated something fundamentally qualitatively different to what’s been demonstrated before (despite not showing any significant algorithmic or theoretic breakthroughs). And I’m sure it will be used maliciously; it will be a powerful tool for disinformation and for influencing discourse at massive scale, and probably only costs about $100k to create.

By releasing the model, this malicious use will happen sooner. But by not releasing the model, there will be fewer defenses available and less real understanding of the issues from those that are impacted. Those both sound like bad outcomes to me.

Five Things That Scare Me About AI

AI is being increasingly used to make important decisions. Many AI experts (including Jeff Dean, head of AI at Google, and Andrew Ng, founder of Coursera and deeplearning.ai) say that warnings about sentient robots are overblown, but other harms are not getting enough attention. I agree. I am an AI researcher, and I’m worried about some of the societal impacts that we’re already seeing. In particular, these 5 things scare me about AI:

  1. Algorithms are often implemented without ways to address mistakes.
  2. AI makes it easier to not feel responsible.
  3. AI encodes & magnifies bias.
  4. Optimizing metrics above all else leads to negative outcomes.
  5. There is no accountability for big tech companies.

At the end, I’ll briefly share some positive ways that we can try to address these.

Before we dive in, I need to clarify one point that is important to understand: algorithms (and the complex systems they are a part of) can make mistakes. These mistakes come from a variety of sources: bugs in the code, inaccurate or biased data, approximations we have to make (e.g. you want to measure health and you use hospital readmissions as a proxy, or you are interested in crime and use arrests as a proxy. These things are related, but not the same), misunderstandings between different stakeholders (policy makers, those collecting the data, those coding the algorithm, those deploying it), how computer systems interact with human systems, and more.

This article discusses a variety of algorithmic systems. I don’t find debates about definitions particularly interesting, including what counts as “AI” or if a particular algorithm qualifies as “intelligent” or not. Please note that the dynamics described in this post hold true both for simpler algorithms, as well as more complex ones.

1. Algorithms are often implemented without ways to address mistakes.

After the state of Arkansas implemented software to determine people’s healthcare benefits, many people saw a drastic reduction in the amount of care they received, but were given no explanation and no way to appeal. Tammy Dobbs, a woman with cerebral palsy who needs an aid to help her to get out of bed, to go to the bathroom, to get food, and more, had her hours of help suddenly reduced by 20 hours a week, transforming her life for the worse. Eventually, a lengthy court case uncovered errors in the software implementation, and Tammy’s hours were restored (along with those of many others who were impacted by the errors).

Observations of 5th grade teacher Sarah Wysocki’s classroom yielded positive reviews. Her assistant principal wrote, “It is a pleasure to visit a classroom in which the elements of sound teaching, motivated students and a positive learning environment are so effectively combined.” Two months later, she was fired by an opaque algorithm, along with over 200 other teachers. The head of the PTA and a parent of one of Wyscoki’s students described her as “One of the best teachers I’ve ever come in contact with. Every time I saw her, she was attentive to the children, went over their schoolwork, she took time with them and made sure.” That people are losing needed healthcare without an explanation or being fired without explanation is truly dystopian!

Headlines from the Verge and the Washington Post
Headlines from the Verge and the Washington Post

As I covered in a previous post, people use outputs from algorithms differently than they use decisions made by humans:

  • Algorithms are more likely to be implemented with no appeals process in place.
  • Algorithms are often used at scale.
  • Algorithmic systems are cheap.
  • People are more likely to assume algorithms are objective or error-free. As Peter Haas said, “In AI, we have Milgram’s ultimate authority figure,” referring to Stanley Milgram’s famous experiments showing that most people will obey orders from authority figures, even to the point of harming or killing other humans. How much more likely will people be to trust algorithms perceived as objective and correct?

There is a lot of overlap between these factors. If the main motivation for implementing an algorithm is cost-cutting, adding an appeals process (or even diligently checking for errors) may be considered an “unnecessary” expense. Cathy O’Neill, who earned her math PhD at Harvard, wrote a book Weapons of Math Destruction, in which she covers how algorithms are disproportionately impacting poor people, whereas the privileged are more likely to still have access to human attention (in hiring, education, and more).

2. AI makes it easier to not feel responsible.

Let’s return to the case of the buggy software used to determine health benefits in Arkansas. How could this have been prevented? In order to prevent severely disabled people from mistakenly losing access to needed healthcare, we need to talk about responsibility. Unfortunately, complex systems lend themselves to a dynamic in which nobody feels responsible for the outcome.

The creator of the algorithm for healthcare benefits, Brant Fries (who has been earning royalties off this algorithm, which is in use in over half the 50 states), blamed state policy makers. I’m sure the state policy makers could blame the implementers of the software. When asked if there should be a way to communicate how the algorithm works to the disabled people losing their healthcare, Fries callously said, “It’s probably something we should do. Yeah, I also should probably dust under my bed,” and then later clarified that he thought it was someone else’s responsibility.

This passing of the buck and failure to take responsibility is common in many bureaucracies. As danah boyd observed, “Bureaucracy has often been used to shift or evade responsibility. Who do you hold responsible in a complex system?” Boyd gives the examples of high-ranking bureaucrats in Nazi Germany, who did not see themselves as responsible for the Holocaust. boyd continues, “Today’s algorithmic systems are extending bureaucracy.”

Another example of nobody feeling responsible comes from the case of research to classify gang crime. A database of gang members assembled by the Los Angeles Police Department (and 3 other California law enforcement agencies) was found to have 42 babies who were under the age of 1 when added to the gang database (28 were said to have admitted to being gang members). Keep in mind these are just some of the most obvious errors- we don’t know how many other people were falsely included. When researchers presented work on using machine learning on this data to classify gang crimes, an audience member asked about ethical concerns. “I’m just an engineer,” responded one of the authors.

I don’t bring this up for the primary purpose of pointing fingers or casting blame. However, a world of complex systems in which nobody feels responsible for the outcomes (which can include severely disabled people losing access to the healthcare they need, or innocent people being labeled as gang members) is not a pleasant place. Our work is almost always a small piece of a larger whole, yet a sense of responsibility is necessary to try to address and prevent negative outcomes.

3. AI encodes & magnifies bias.

But isn’t algorithmic bias just a reflection of how the world is? I get asked a variation of this question every time I give a talk about bias. To which my answer is: No, our algorithms and products impact the world and are part of feedback loops. Consider an algorithm to predict crime and determine where to send police officers: sending more police to a particular neighhorhood is not just an effect, but also a cause. More police officers can lead to more arrests in a given neighborhood, which could cause the algorithm to send even more police to that neighborhood (a mechanism described in this paper on runaway feedback loops).

Bias is being encoded and even magnified in a variety of applications:

  • software used to decide prison sentences that has twice as high a false positive rate for Black defendents as for white defendents
  • computer vision software from Amazon, Microsoft, and IBM performs significantly worse on people of color
Research by Joy Buolamwini and Timnit Gebru found that commercial computer vision software performed significantly worse on women with dark skin. Gendershades.org
Research by Joy Buolamwini and Timnit Gebru found that commercial computer vision software performed significantly worse on women with dark skin. Gendershades.org
  • Word embeddings, which are a building block for language tools like Gmail’s SmartReply and Google Translate, generate useful analogies such as Rome:Italy :: Madrid:Spain, as well as biased analogies such as man:computer programmer :: woman: homemaker.
  • Machine learning used in recruiting software developed at Amazon penalized applicants who attended all-women’s colleges, as well as any resumes that contained the word “women’s.”
  • Over 2/3 of the images in ImageNet, the most studied image data set in the world, are from the Western world (USA, England, Spain, Italy, Australia).
Chart from 'No Classification without Representation' by Shankar, et. al, shows the origin of ImageNet photos: 45% US, 8% UK, 6% Italy, 3% Canada, 3% Australia, 3% Spain,...
Chart from 'No Classification without Representation' by Shankar, et. al, shows the origin of ImageNet photos: 45% US, 8% UK, 6% Italy, 3% Canada, 3% Australia, 3% Spain,...

Since a Cambrian explosion of machine learning products is occuring, the biases that are calcified now and in the next few years may have a disproportionately huge impact for ages to come (and will be much harder to undo decades from now).

4. Optimizing metrics above all else leads to negative outcomes.

Worldwide, people watch 1 billion hours of YouTube per day (yes, that says PER DAY). A large part of YouTube’s successs has been due to its recommendation system, in which a video selected by an algorithm automatically begin playing once the previous video is over. Unfortunately, these recommendations are disproportionately for conspiracy theories promoting white supremacy, climate change denial, and denial of the mass shootings that plague the USA. What is going on? YouTube’s algorithm is trying to maximize how much time people spend watching YouTube, and conspiracy theorists watch significantly more YouTube than people who trust a variety of media sources. Unfortunately, a recommendation system trying only to maximize time spent on its own platform will incentivize content that tells you the rest of the media is lying.

YouTube may be one of the most powerful radicalizing instruments of the 21st century,” Professor Zeynep Tufekci wrote in the New York Times. Guillaume Chaslot is a former YouTube engineer turned whistleblower. He has been outspoken about the harms caused by YouTube, and he partnered with the Guardian and the Wall Street Journal to study the extremism and bias in YouTube’s recommendations.

Photo of Guillaume Chaslot from the Guardian article
Photo of Guillaume Chaslot from the Guardian article

YouTube is owned by Google, which is earning billions of dollars by aggressively introducing vulnerable people to conspiracy theories, while the rest of society bears the externalized costs of rising authoritarian governments, a resurgence in white supremacist movements, failure to act on climate change (even as extreme weather is creating increasing numbers of refugees), growing distrust of mainstream news sources, and a failure to pass sensible gun laws.

This problem is an example of the tyranny of metrics: metrics are just a proxy for what you really care about, and unthinkingly optimizing a metric can lead to unexpected, negative results. One analog example is that when the UK began publishing the success rates of surgeons, heart surgeons began turning down risky (but necessary) surgeries to try to keep their scores as high as possible.

Returning to the account of the popular 5th grade teacher who was fired by an algorithm, she suspects that the underlying reason she was fired was that her incoming students had unusually high test scores the previous year (making it seem like their scores had dropped to a more average level after her teaching), and that their former teachers may have cheated. As USA education policy began over-emphasizing student test scores as the primary way to evaluate teachers, there have been widespread scandals of teachers and principals cheating by altering students scores, in Georgia, Indiana, Massachusetts, Nevada, Virginia, Texas, and elsewhere. When metrics are given undue importance, attempts to game those metrics become common.

5. There is no accountability for big tech companies.

Major tech companies are the primary ones driving AI advances, and their algorithms impact billions of people. Unfortunately, these companies have zero accountability. YouTube (owned by Google) is helping to radicalize people into white supremacy. Google allowed advertisers to target people who search racist phrases like “black people ruin neighborhoods” and Facebook allowed advertisers to target groups like “jew haters”. Amazon’s facial recognition technology misidentified 28 members of congress as criminals, yet it is already in use by police departments. Palantir’s predictive policing technology was used for 6 years in New Orleans, with city council members not even knowing about the program, much less having any oversight. The newsfeed/timeline/recommendation algorithms of all the major platforms tend to reward incendiary content, prioritizing it for users.

In early 2018, the UN ruled that Facebook had played a “determining role” in the ongoing genocide in Myanmar. “I’m afraid that Facebook has now turned into a beast,” said the UN investigator. This result was not a surprise to anyone who had been following the situation in Myanmar. People warned Facebook executives about how the platform was being used to spread dehumanizing hate speech and incite violence against an ethnic minority as early as 2013, and again in 2014 and 2015. As early as 2014, news outlets such as Al Jazeera were covering Facebook’s role in inciting ethnic violence in Myanmar.

One person close to the case said, “That’s not 20/20 hindsight. The scale of this problem was significant and it was already apparent.Facebook execs were warned in 2015 that Facebook could play the same role in Myanmar that radio broadcasts had played during the 1994 Rwandan genocide. As of 2015, Facebook only employed 4 contractors who spoke Burmese (the primary language in Myanmar).

Contrast Facebook’s inaction in Myanmar with their swift action in Germany after the passage of a new law, which could have resulted in penalties of up to 50 million euros. Facebook hired 1,200 German contractors in under a year. In 2018, five years after Facebook was first warned about how they were being used to incite violence in Myanmar, they hired “dozens” of Burmese contractors, a fraction of their response in Germany. The credible threat of a large financial penalty may be the only thing Facebook responds to.

While it can be easy to focus on regulations that are misguided or ineffective, we often take for granted safety standards and regulations that have largely worked well. One major success story comes from automobile safety. Early cars had sharp metal knobs on dashboard that lodged in people’s skulls during crashes, plate glass windows that shattered dangerously, and non-collapsible steering columns that would frequently impale drivers. Beyond that, there was a widespread belief that the only issue with cars was the people driving them, and car manufactures did not want data on car safety to be collected. It took consumer safety advocates decades to push the conversation to how cars could be designed with greater safety, and to pass laws regarding seat belts, driver licenses, crash tests, and the collection of car crash data. For more on this topic, Datasheets for Datasets covers cases studies of how standardization came to the electronics, pharmaceutical, and automobile industries, and 99% Invisible has a deep dive on the history of car safety (with parallels and contrasts to the gun industry).

How We Can Do Better

The good news: none of the problems listed here are inherent to algorithms! There are ways we can do better:

  • Make sure there is a meaningful, human appeals process. Plan for how to catch and address mistakes in advance.
  • Take responsibility, even when our work is just one part of the system.
  • Be on the lookout for bias. Create datasheets for data sets.
  • Choose not to just optimize metrics.
  • Push for thoughtful regulations and standards for the tech industry.

The problems we are facing can feel scary and complex. However, it is still very early on in this age of AI and increasing algorithmic automation. Now is a great time to take action: we can change our culture, cultivate a greater sense of responsibility for our work, seek out thoughtful accountability to counterbalance the inordinate power that major tech companies have, and choose to create more humane products and systems. Technology is just a tool, and it can be used for good or bad. Let’s work to use it for good, to improve the lives of many, rather than just generate wealth for a small number of people.

fast.ai Diversity Fellows and Sponsors Wanted

This post was originally published on 2018-08-16, but has been updated for the newest, upcoming course.

At fast.ai, we want to do our part to increase diversity in deep learning and to lower the unnecessary barriers to entry for everyone. We are providing diversity scholarships for our updated part-time, in-person Deep Learning for Coders part 2 course presented in partnership with the University of San Francisco Data Institute, to be offered one evening per week for 7 weeks, starting March 18, 2019, in downtown San Francisco. Women, people of Color, LGBTQ people, people with disabilities, and/or veterans are eligible to apply. We are still looking for additional financial sponsors, so please contact datainstitute@usfca.edu if your company is interested in donating.

The deadline to apply is February 14, 2019. Details on how to apply, and FAQ, are at the end of this post.

What can you do with deep learning?

Deep learning has great potential for good. It is being used by fast.ai students and teachers to diagnose cancer, stop deforestation of endangered rain-forests, provide better crop insurance to farmers in India (who otherwise have to take predatory loans from thugs, which have led to high suicide rates), help Urdu speakers in Pakistan, detect udder infections in goats and cows, develop wearable devices for patients with Parkinson’s disease, and much more. Deep learning could address the global shortage of doctors, provide more accurate medical diagnoses, improve energy efficiency, increase farm yields, and reduce pesticide use.

However, there is also great potential for harm. We are worried about unethical uses of data science, and about the ways that society’s racial and gender biases (summary here) are being encoded into our machine learning systems. We are concerned that an extremely homogeneous group is building technology that impacts everyone. People can’t address problems that they’re not aware of, and with more diverse practitioners, a wider variety of important societal problems will be tackled.

We want to get deep learning into the hands of as many people as possible, from as many diverse backgrounds as possible. People with different backgrounds have different problems they’re interested in solving. The traditional approach is to start with an AI expert and then give them a problem to work on; at fast.ai we want people who are knowledgeable and passionate about the problems they are working on, and we’ll teach them the deep learning they need. In my TEDx talk, I shared how my unlikely background led me to the work I do now and why we need more people with unlikely backgrounds in the field, both to address misuses of AI, as well as to take full advantage of the positive opportunities.

slide from my TEDx
slide from my TEDx

While some people worry that it’s risky for more people to have access to AI; I believe the opposite. We’ve already seen the harm wreaked by elite and exclusive companies such as Facebook, Palantir, and YouTube/Google. Getting people from a wider range of backgrounds involved can help us address these problems.

The fast.ai approach

We began fast.ai with an experiment: to see if we could teach deep learning to coders, with no math pre-requisites beyond high school math, and get them to state-of-the-art results in just 7 weeks. This was very different from other deep learning materials, many of which assume a graduate level math background, focus on theory, only work on toy problems, and don’t even include the practical tips. We didn’t even know if what we were attempting was possible, but the fast.ai course has been a huge success!

Fast.ai students have been accepted to the launched companies, won hackathons, invented a new fraud detection algorithm, had work featured on the HBO TV show Silicon Valley, and more, all from taking a course that has only one year of coding experience as the pre-requisite.

Coverage of fast.ai in the Verge and MIT Technology Review
Coverage of fast.ai in the Verge and MIT Technology Review

fast.ai is not just an educational resource; we also do cutting-edge research and have achieved state-of-the-art results. Our wins (and here) in Stanford’s DAWNBench competition against much better funded teams from Google and Intel were covered in the MIT Tech Review and the Verge. Jeremy’s work with Sebastian Ruder achieving state-of-the art on 6 language classification datasets was accepted by ACL, has been built upon by OpenAI and Google Brain; and was featured in the New York Times. All this research is incorporated into our course, teaching students state-of-the-art techniques.

We are looking for additional companies to sponsor diversity fellowships. Please contact Mindi datainstitute@usfca.edu if your company might be interested!

Who is eligible for a diversity fellowship?

Wondering if you’re qualified? The requirements are:

  • Familiarity with Python, git, and bash
  • Familiarity with the content covered in Deep Learning Part 1, v3 (available for free online), including the fastai library, a high-level wrapper for PyTorch (it’s OK to start studying this material now, as long as you complete it by the start of the course)
  • Curiosity and a willingness to work hard
  • Able to commit 10 hours a week of study to the course (includes time for homework).
  • Identify as a woman, person of Color, LGBTQ person, person with a disability, and/or veteran
  • Be available to attend in-person 6:30-9pm in downtown San Francisco, one evening per week (exact schedule found here under details, day of the week varies)

You can fulfill the requirement to be familiar with deep learning, the fastai library, and PyTorch by doing any 1 of the following:

  • You took the updated, in-person deep learning part 1 course during fall 2018
  • You have watched the first 2 videos of the online course before you apply, and a commitment to work through all 7 lessons before the start of the course. We estimate that each lesson takes approximately 10 hours of study (so you would need to study for the 7 weeks prior to the course starting on March 18, for 10 hours each week).
  • You have previously taken the older version of the course (released last year) AND watch the first 4 lessons of the new course to get familiar with the fastai library and PyTorch.

Deep Learning Part 1 covers the use of deep learning for image recognition, recommendation systems, sentiment analysis, and time-series prediction. Part 2 will take this further by teaching you how to read and implement cutting edge research papers, generative models and other advanced architectures, and more in-depth natural language processing. As with all fast.ai courses, it will be practical, state-of-the-art, and geared towards coders.

How to Apply for a Fellowship

The number of scholarships we are able to offer depends on how much funding we receive (if your organization may be able to sponsor one or more places, please let us know). To apply for the fellowship, you will need to submit a resume and statement of purpose. The statement of purpose will include the following:

  • 1 paragraph describing one or more problems you’d like to apply deep learning to
  • 1 paragraph describing previous machine learning education or experience (e.g. fast.ai courses, Coursera, deeplearning.ai,…)
  • which under-indexed group(s) you are a part of (gender, race, sexual identity, veteran)

Diversity Fellowship applications should be submitted here: https://certificate.usfca.edu/register/di-application

If you have any questions, please email datainstitute@usfca.edu.

The deadline to apply is February 14, 2019.


I’m not eligible for the diversity scholarship, but I’m still interested. Can I take the course? Absolutely! You can register here.

I don’t live in the San Francisco Bay Area; can I participate remotely? Yes! Stay tuned for details to be released in a blog post in the next few weeks.

Will this course be made available online later? Yes, this course will be made freely available online afterwards. Benefits of taking the in-person course include earlier access, community and in-person interaction, and more structure (for those that struggle with motivation when taking online courses).

Is fast.ai able to sponsor visas or provide stipends for living expenses? No, we are not able to sponsor visas nor to cover living expenses.

How will this course differ from the previous fast.ai courses? Our goal at fast.ai is to push the state-of-the-art. Each year, we want to make deep learning increasingly intuitive to use while giving better results. With our fastai library, we are beating our own state-of-the-art results from last year.

What language is the course taught in? The course is taught in Python, using the fastai library and PyTorch. Some of our students have gone on to use the fastai library in production at Fortune 500 companies.