Our online courses (all are free and have no ads):

Other useful resources

fast.ai in the news:

6 Important Videos about Tech, Ethics, Policy, and Government

The remaining 6 videos from the the University of San Francisco Center for Applied Data Ethics Tech Policy Workshop are now available. This workshop was held in November 2019, which seems like a lifetime ago, yet the themes of tech ethics and responsible government use of technology remain incredibly relevant, particularly as governments are considering controversial new uses of technology for tracking or addressing the pandemic.

You can go straight to the videos here, or read more below:

And be sure to check out the full playlist of workshop videos here!

Speakers (left-right, top-bottom): Kristian Lum, Tawana Petty, Irina Raicu; Heather Patterson, Brian Hofer, Linda Gerull; Rumman Chowdhury, Deven Desai
Speakers (left-right, top-bottom): Kristian Lum, Tawana Petty, Irina Raicu; Heather Patterson, Brian Hofer, Linda Gerull; Rumman Chowdhury, Deven Desai

Hypervisibilizing the unseen: Dominant narratives, smart cities and race-blind tech policies

If you teach the world to fear the other, individualization and hyper surveillance become inevitable. Detroit is an incredible example of how the power of propaganda became a toolkit for race-blind policies with racist consequences, data and tech misuse, digital surveillance and the dangerous conflation between safety and security. “Be Afraid, Be Very Afraid,” is no longer just the tagline from a classic movie, it’s become a mantra for being and an excuse not to see each other. Tawana Petty is Director of the Data Justice Program for the Detroit Community Technology Project and co-leads Our Data Bodies, a five-person team concerned about the ways our communities’ digital information is collected, stored, and shared by government and corporations. Watch her talk here:

Fairness, Accountability, and Transparency: Lessons from predictive models in criminal justice

The related topics of fairness, accountability, and transparency in predictive modeling have seen increased attention over the last several years. One application area where these topics are particularly important is criminal justice. In this talk, Dr. Lum gives an overview of her work in this area— spanning a critical look at predictive policing algorithms to the role of police discretion in pre-trial risk assessment models to a look behind the scenes at how risk assessment models are created in practice. Through these examples, she demonstrate the importance of each of these concepts in predictive modeling in general and in the criminal justice system in particular. Kristian Lum is an assistant research professor at Penn CIS and the lead statistician at the Human Rights Data Analysis Group (HRDAG), where she leads the HRDAG project on criminal justice in the United States. Watch her talk here:

Diverse Faces, Diverse Lenses: Applied ethics and facial recognition research

Ethical issues are like birds: they are pervasive, varied, and often go unnoticed (especially by those not trained to identify them). Ethical “lenses” (or approaches) can help us see them. This presentation will introduce the Markkula Center for Applied Ethics Framework for Ethical Decision-Making, which features five ethical lenses. Attendees will then work together to apply those lenses to a case study that reflects the complexity of ethical decisions faced by practitioners who work with data. Irina Raicu is Director of the Internet Ethics Program at the Markkula Center for Applied Ethics.

Watch her talk here:

Panel on Local Government

There are significant challenges to creating informed tech policy, including: the diverse range of stakeholders involved, the way silicon valley incentives are misaligned with reflective policy making, binary modes of thinking, that munipalities are often an afterthought to tech companies, the gap between intended and actual use, and more. Our panel on local government had a lively and informative discussion. Panelists included:

  • Linda Gerull, CIO of City and County of San Francisco
  • Brian Hofer, founder of the non-profit Secure Justice whose work drafting SF’s ban on facial recognition was covered in the New York Times
  • Lee Hepner, an attorney and legislative aide who worked on SF’s facial recognition ban
  • Heather Patterson, privacy researcher at Intel and member of the Oakland Privacy Advisory Commission

Watch the panel here:

Deconstructing the Surveillance State

The smart city has become co-opted by an exclusionary narrative that enables a surveillance state. In this talk, Dr. Chowdhury presents the global imperative to deconstruct the current surveillance state by illustrating already-existing harms. In it’s place, she shares a vision of Digital Urban Design, which presents a community-driven and collaborative smart city. A work in progress, the goal of digital urban design is to evolve the field of urban design to merge the digital and analog fabrics in a way that impacts and improves the lives of citizens. Rumman Chowdhury is the the Global Lead for Responsible AI at Accenture Applied Intelligence, where she works with C-suite clients to create cutting-edge technical solutions for ethical, explainable and transparent AI. Watch her talk here:

Law and Data Driven Innovation

Data-driven innovation fueled Silicon Valley and more after the first dotcom bubble. It still has potential to drive incredible outcomes. Yet the days of deference to companies because of promised innovation and creation of wealth seem to be over. This talk looks at where we came from and how changes in the law show dissatisfaction with innovation narratives. And yet, the talk offers that there is a way to use data and software to build trust and success going forward. Deven Desai is an associate professor in the faculty of the Law and Ethics Program at Georgia Tech’s Scheller College of Business. Watch his talk here:

More CADE Videos

You can check out the full playlist of videos from the CADE Tech Policy Workshop here!

Be sure to read these posts with other videos and material from CADE events:

Saving The Mask

A message from Jeremy: This is a very special guest post from Sarada Lee (李文華), a Visiting Scholar at the Data Institute (University of San Francisco) and Conjoint Fellow at the School of Medicine and Public Health at the University of Newcastle. She’s also one of our most inspirational and impactful fast.ai alumni. Here, Sarada brings to us some potentially life-saving expertise that has been developed over the last 20 years in places that have already tackled respitatory pandemics: how to create masks, scaling from home production all the way to to mass production. The ability of even simple masks to protect from infection is now well established, with even the CDC providing guidance that basic surgical masks are worth using. If you go out without a mask, and you’re infected without knowing it, you’re putting your community at risk. And if someone that’s infected coughs when you’re within a couple of meters from them, and you’re not wearing a mask (and googles or glasses) then you’re at a much higher risk of getting infected yourself (this is both established science from a virology and epidemiology standpoint). There’s a huge shortage of masks in the US and many other countries, so please get to work creating masks, because you can help save lives. Your local hospital is the most important place to help first. The method here requires special equipment and materials, so you’ll need to coordinate with your local community to get things going. Also, there are simpler methods to create simpler masks. For help with both, we’ve provided a wiki and discussion thread for this post.

Disclaimer: I am not a qualified medical professional. But, I lived through SARS in Hong Kong back in 2003. My ex-colleague was a confirmed case. I used to walk past her desk every day prior to her being admitted into a hospital.

“Virtually everybody here (Hong Kong) has been through the drill, they know the consequences.” (Keiji Fukuda, a U.S. expert on infectious diseases and former assistant director-general for health security at the World Health Organization)

One may argue about the protectiveness of wearing a mask. But, the bottom line is if I am an asymptomatic carrier of COVID-19, I won’t spread the virus to the people around me. That will help to “flatten the curve” (check out Jeremy and Rachel’s explanation).


This video 1 (in Cantonese), shows the background and the development process.

How to make the reusable mask (the base model)?

What you require

  1. Equipment/tools:
    • A 3D printer (I use MakerBot Replicator with a legacy printer support software. So, I can’t use most of the features and need to work around calibration (ie leveling the printing bed.))
    • A vacuum mold forming machine (I use Mayku FormBox.)
    • A pair of scissors
    • A hole puncher
    • A grinder
  2. Materials:
    • 3D printing material (I used a Mayku.)
    • Food grade (very important!) vacuum mold forming material (1 form sheet (white color) for inner layer and 1 cast sheet (transparent) for outer layer)
    • A surgical mask (1 piece can be cut into 10 smaller pieces or more ie 10x supply)
    • Elastic band (length to adjust to fit you best)
    • Soft material for nosepiece (I don’t know what will work best yet)
    • Glue (suitable for plastic and soft material)
    • Masking tape (preferably in sheet instead of roll for 3D printer’s print bed)


  1. Download the file for 3D printing from the links below (credit to TinkerCad user ID: alex.wh.leeJ5ZTZ 李偉康老師). 3 mold sizes are available. (Source files are also available so you can modify them for customization.)

Indicative print time and materials (g) required by size:

Quality (options available to me) Large Medium Small

5:46 hrs



3:08 hrs



4:40 hrs



2:34 hrs



6:10 hrs



2:45 hrs


  1. Print a mask mold (blue color in the video).
  2. Put a form sheet (white color in the video) over the mask mold to make the inner layer of the mask.
  3. Use a pair of scissors to cut out the inner layer (similar to the shape of the mold)
  4. Put a cast sheet (transparent in the video) over the mask mold to make the outer layer of the mask.
  5. Use a pair of scissors to cut out the outer layer (leave extra material for each side in order to put on elastic bands - see light blue line and point above)
  6. Use a hole puncher to put holds on the outer layer (better to punch it from insider)
  7. Use a grinder to polish all the sharp edges
  8. Put a small piece of surgical mask in the filter area
  9. Put elastic bands to the holes
  10. Glue soft material around the nose bridge
  11. Check out this video 2 on how to wear and remove it. It is also important NOT to touch the front surface after use. Always perform hand hygiene prior to wear and after removing it. For face-hair growers, please consider changing your facial hairstyle based on CDC’s advice.3

How can we do better?

For the mask design

Of course, the download file may not fit you perfectly which is the key for protection. You may want to modify the source files. But, you need to find out your face shape before you can modify it. This can be done by using multiple images from different angles to create a 3D model (look for structure from motion software).

Alternatively, using deep learning to create it using a single photo (paper: Photorealistic Facial Texture Inference Using Deep Neural Networks by Shunsuke Saito et al. https://arxiv.org/pdf/1612.00523v1.pdf). Any one is up for this challenge?

For engineers out there, to adhere a valve or more fancy design.

For the filter

One may doubt the protection from a surgical mask. I was able to source CKP-V28 filter sheets which can filter anything bigger than 0.3 micron which is a similar particle filtration rating as N95 (US) or P2 (Europe). (Caution: if this idea worked, this version of mask has NOT been tested or certified as a personal protective equipment.)

For food grade 3D printing materials

To save money on materials, it is possible to recycle plastic bottles into filament by using a special cutter 4 or you can make one 5.

For mass production

For those in the industry, the mold can be made in aluminum using CNC machines. Please use your skills and resources to support this.

Campaign Status:

Date Actions
14 Mar 20 Found out about “Saving the Mask” Campaign
15 Mar 20 Contacted local makers community who own the equipment and sourced materials
18 Mar 20 Equipment was delivered and being briefly trained
19 Mar 20 Printed a mold (small size)
20 Mar 20 Small size of mask was made (see Annex 1) with a try-out-list
By 27 Mar 20 Improve my skills and work on the try-out-list and order more casting and mold sheets

Remarks: I don’t have any financial interest in any products mentioned here.

Final note

Personal hygiene (more than washing hands regularly) is important 6.

Social distancing/self-isolating is also important.

If you don’t have hand sanitizer (with over 70% alcohol content). WHO recommends two formulas 7 for small volume production. While Ethanol (96%) may not be available, Alternatively, check with your local compounding pharmacy (not normal pharmacy), they may be able to make it for you.


(Click ↩ on a footnote to go back to where you were.)

  1. “可重用STEM口罩 料日產二萬口罩贈學界社福界” (Translation: Reusable STEM masks, expected 20K daily production to benefit educational sector and the society) by 香港大紀元新唐人聯合新聞頻道 (Hong Kong Epoch Times), 13 March 2020. Video. Press

  2. “N95 3M mask: How to Wear & Remove” by SingHealth. Video 

  3. “To Beard or not to Beard? That’s a good Question!” in Centres for Disease Control and Prevention by Jaclyn Krah Cichowicz et al. 2 November 2017 

  4. A water bottle cutter Video 

  5. DIY water bottle cutter video 

  6. The Hong Kong University’s recommendations on personal hygiene. Credit: Prof. Ivan Hung (HKU Faculty) 

  7. WHO Guidelines on Hand Hygiene in Health Care: First Global Patient Safety Challenge Clean Care Is Safer Care. 12 WHO-recommended hand-rub formulations Website 

Covid-19, your community, and you — a data science perspective

We are data scientists—that is, our job is to understand how to analyze and interpret data. When we analyze the data around covid-19, we are very concerned. The most vulnerable parts of society, the elderly and the poor, are most at risk, but controlling the spread and impact of the disease requires us all to change our behavior. Wash your hands thoroughly and regularly, avoid groups and crowds, cancel events, and don’t touch your face. In this post, we explain why we are concerned, and you should be too. For an excellent summary of the key information you need to know, read Corona in Brief by Ethan Alley (the president of a non-profit that develops technologies to reduce risks from pandemics).


Anyone is welcome to translate this article, to help their local communities understand these issues. Please link back to here with appropriate credit. Let us know on Twitter so we can add your translation to this list.


We need a working medical system

Just over 2 years ago one of us (Rachel) got a brain infection which kills around 1/4 of people who get it, and leaves 1/3 with permanent cognitive impairment. Many others end up with permanent vision and hearing damage. Rachel was delirious by the time she crawled across the hospital parking lot. She was lucky enough to receive prompt care, diagnosis, and treatment. Up until shortly before this event Rachel was in great health. Having prompt access to the emergency room almost certainly saved her life.

Now, let’s talk about covid-19, and what might happen to people in Rachel’s situation in the coming weeks and months. The number of people found to be infected with covid-19 doubles every 3 to 6 days. With a doubling rate of three days, that means the number of people found to be infected can increase 100 times in three weeks (it’s not actually quite this simple, but let’s not get distracted by technical details). One in 10 infected people requires hospitalization for many weeks, and most of these require oxygen. Although it is very early days for this virus, there are already regions where hospitals are entirely overrun, and people are no longer able to get the treatment that they require (not only for covid-19, but also for anything else, such as the life-saving care that Rachel needed). For instance, in Italy, where just a week ago officials were saying that everything was fine, now sixteen million people have been put on lock-down (update: 6 hours after posting this, Italy put the entire country on lock-down), and tents like this are being set up to help handle the influx of patients:

A medical tent used in Italy
A medical tent used in Italy

Dr. Antonio Pesenti, head of the regional crisis response unit in a hard-hit area of Italy, said, “We’re now being forced to set up intensive care treatment in corridors, in operating theaters, in recovery rooms… One of the best health systems in the world, in Lombardy is a step away from collapse.”

This is not like the flu

The flu has a death rate of around 0.1% of infections. Marc Lipsitch, the director of the Center for Communicable Disease Dynamics at Harvard, estimates that for covid-19 it is 1-2%. The latest epedemiological modeling found a 1.6% rate in China in February, sixteen times higher than the flu1 (this might be quite a conservative number however, because rates go up a lot when the medical system can’t cope). Current best estimates expect that covid-19 will kill 10 times more people this year than the flu (and modeling by Elena Grewal, former director of data science at Airbnb, shows it could be 100 times more, in the worst case). This is before taking into consideration the huge impact on the medical system, such as that described above. It is understandable that some people are trying to convince themselves that this is nothing new, an illness much like the flu, because it is very uncomfortable to accept the reality that this is not familiar at all.

Trying to understand intuitively an exponentially increasing growth in the number of infected people is not something that our brains are designed to handle. So we have to analyze this as scientists, not using our intuition.

Where will this be in 2 weeks? 2 months?
Where will this be in 2 weeks? 2 months?

For each person that has the flu, on average, they infect 1.3 other people. That’s called the “R0” for flu. If R0 is less than 1.0, then an infection stops spreading and dies out. If it’s over 1.0, it spreads. R0 currently is 2-3 for covid-19 outside China. The difference may sound small, but after 20 “generations” of infected people passing on their infection, an R0 of 1.3 would result in 146 infections, but an R0 of 2.5 would result in 36 million infections! (This is, of course, very hand-wavy and ignores many real-world impacts, but it’s a reasonable illustration of the relative difference between covid-19 and flu, all other things being equal).

Note that R0 is not some fundamental property of a disease. It depends greatly on the response, and it can change over time2. Most notably, in China R0 for covid-19 has come down greatly, and is now approaching 1.0! How, you ask? By putting in place measures at a scale that would be hard to imagine in a country such as the US—for instance, entirely locking down many giant cities, and developing a testing process that allows more than a million people a week to be tested.

One thing which comes up a lot on social media (including from highly-followed accounts such as Elon Musk) is a misunderstanding of the difference between logistic and exponential growth. “Logistic” growth refers to the “s-shaped” growth pattern of epidemic spread in practice. Obviously exponential growth can’t go on forever, since otherwise there would be more people infected than people in the world! Therefore, eventually, infection rates must always decreasing, resulting in an s-shaped (known as sigmoid) growth rate over time. However, the decreasing growth only occurs for a reason–it’s not magic. The main reasons are:

  • Massive and effective community response, or
  • Such a large percentage of people are infected that there’s fewer uninfected people to spread to.

Therefore, it makes no logical sense to rely on the logistic growth pattern as a way to “control” a pandemic.

Another thing which makes it hard to intuitively understand the impact of covid-19 in your local community is that there is a very significant delay between infection and hospitalization — generally around 11 days. This may not seem like a long time, but when you compare it to the number of people infected during that time, it means that by the time you notice that the hospital beds are full, community infection is already at a level that there will be 5-10 times more people to deal with.

Note that there are some early signs that the impact in your local area may be at least somewhat dependent on climate. The paper Temperature and latitude analysis to predict potential spread and seasonality for COVID-19 points out that the disease has so far been spreading in mild climates (unfortunately for us, the temperature range in San Francisco, where we live, is right in that range; it also covers the main population centers of Europe, including London.)

“Don’t panic. Keep calm.” is not helpful

One common response we’ve seen on social media to people that are pointing out the reasons to be concerned, is “don’t panic” or “keep calm”. This is, to say the least, not helpful. No-one is suggesting that panic is an appropriate response. For some reason, however, “keep calm” is a very popular reaction in certain circles (but not amongst any epidemiologists, whose job it is to track these things). Perhaps “keep calm” helps some people feel better about their own inaction, or makes them feel somehow superior to people who they imagine are running around like a headless chicken.

But “keep calm” can easily lead to a failure to prepare and respond. In China, tens of millions were put on lock-down and two new hospitals were built by the time they reached the statistics that the US has now. Italy waited too long, and just today (Sunday March 8) they reported 1492 new cases and 133 new deaths, despite locking down 16 million people. Based on the best information we’re able to ascertain at this stage, just 2-3 weeks ago Italy was in the same position that the US and UK are in today (in terms of infection statistics).

Note that nearly everything about covid-19 at this stage is up in the air. We don’t really know it’s infection speed or mortality, we don’t know how long it remains active on surfaces, we don’t know whether it survives and spreads in warm conditions. Everything we have is current best guesses based on the best information people are able to put together. And remember, the vast majority of this information is in China, in Chinese. Currently, the best way to understand the Chinese experience so far is to read the excellent Report of the WHO-China Joint Mission on Coronavirus Disease 2019, based on a joint mission of 25 national and international experts from China, Germany, Japan, Korea, Nigeria, Russia, Singapore, the United States of America and the World Health Organization (WHO).

When there’s some uncertainty, that perhaps this won’t be a global pandemic, and perhaps everything just might pass by without the hospital system collapsing, that doesn’t mean that the right response is to do nothing. That would be enormously speculative and not an optimal response under any threat modeling scenario. It also seems extremely unlikely that countries like Italy and China would effectively shut down large parts of their economy for no good reason. It’s also not consistent with the actual impacts we’re seeing on the ground in infected areas, where the medical system is unable to cope (for instance, Italy is using 462 tents for “pre-triage”, and still has to move ICU patients from infected areas).

Instead, the thoughtful, reasonable response is to follow the steps that are recommended by experts to avoid spreading infections:

  • Avoid large groups and crowds
  • Cancel events
  • Work from home, if at all possible
  • Wash hands when coming and going from home, and frequently when out
  • Avoid touching your face, especially when outside your home (not easy!)
  • Disinfect surfaces and packages (it’s possible the virus may remain active for 9 days on surfaces, although this still isn’t known for sure either way).

It’s not just about you

If you are under 50, and do not have risk factors such as a compromised immune system, cardiovascular disease, a history of previous smoking, or other chronic illnesses, then you can have some comfort that covid-19 is unlikely to kill you. But how you respond still matters very much. You still have just as much chance of getting infected, and if you do, just as much chance of infecting others. On average, each infected person is infecting over two more people, and they become infectious before they show symptoms. If you have parents that you care about, or grandparents, and plan to spend time with them, and later discover that you are responsible for infecting them with covid-19, that would be a heavy burden to live with.

Even if you are not in contact with people over 50, it is likely that you have more coworkers and acquaintances with chronic illnesses than you realize. Research shows that few people disclose their health conditions in the workplace if they can avoid it, for fear of discrimination. Both of us are in high risk categories, but many people who we interact with regularly may not have known this.

And of course, it is not just about the people immediately around you. This is a highly significant ethical issue. Each person who does their best to contribute to controlling the spread of the virus is helping their whole community to slow down the rate of infection. As Zeynep Tufekci wrote in Scientific Amercian: “Preparing for the almost inevitable global spread of this virus… is one of the most pro-social, altruistic things you can do”. She continues:

We should prepare, not because we may feel personally at risk, but so that we can help lessen the risk for everyone. We should prepare not because we are facing a doomsday scenario out of our control, but because we can alter every aspect of this risk we face as a society. That’s right, you should prepare because your neighbors need you to prepare—especially your elderly neighbors, your neighbors who work at hospitals, your neighbors with chronic illnesses, and your neighbors who may not have the means or the time to prepare because of lack of resources or time.

This has impacted us personally. The biggest and most important course we’ve ever created at fast.ai, which represents the culmination of years of work for us, was scheduled to start at the University of San Francisco in a week. Last Wednesday (March 4), we made the decision to move the whole thing online. We were one of the first large courses to move online. Why did we do it? Because we realized early last week that if we ran this course, we were implicitly encouraging hundreds of people to get together in an enclosed space, multiple times over a multi-week period. Bringing groups together in enclosed spaces is the single worst thing that can be done. We felt ethically obliged to ensure that, at least in this case, this didn’t happen. It was a heart-breaking decision. Our time spent working directly with our students has been one of the great pleasures and most productive periods every year. And we had students planning to fly in from all over the world, who we really didn’t want to let down3.

But we knew it was the right thing to do, because otherwise we’d be likely to be increasing the spread of the disease in our community4.

We need to flatten the curve

This is extremely important, because if we can slow down the rate of infection in a community, then we give hospitals in that community time to deal with both the infected patients, and with the regular patient load that they need to handle. This is described as “flattening the curve”, and is clearly shown in this illustrative chart:

Staying under that dotted line means everything
Staying under that dotted line means everything

Farzad Mostashari, the former National Coordinator for Health IT, explained: “New cases are being identified every day that do not have a travel history or connection to a known case, and we know that these are just the tip of the iceberg because of the delays in testing. That means that in the next two weeks the number of diagnosed cases will explode… Trying to do containment when there is exponential community spread is like focusing on putting out sparks when the house is on fire. When that happens, we need to switch strategies to mitigation–taking protective measures to slow spread & reduce peak impact on healthcare.” If we can keep the spread of disease low enough that our hospitals can handle the load, then people can access treatment. But if the cases come too quickly, then those that need hospitalization won’t get it.

Here’s what the math might look like, according to Liz Specht:

The US has about 2.8 hospital beds per 1000 people. With a population of 330M, this is ~1M beds. At any given time, 65% of those beds are already occupied. That leaves about 330k beds available nationwide (perhaps a bit fewer this time of year with regular flu season, etc). Let’s trust Italy’s numbers and assume that about 10% of cases are serious enough to require hospitalization. (Keep in mind that for many patients, hospitalization lasts for weeks — in other words, turnover will be very slow as beds fill with COVID19 patients). By this estimate, by about May 8th, all open hospital beds in the US will be filled. (This says nothing, of course, about whether these beds are suitable for isolation of patients with a highly infectious virus.) If we’re wrong by a factor of two regarding the fraction of severe cases, that only changes the timeline of bed saturation by 6 days in either direction. If 20% of cases require hospitalization, we run out of beds by ~May 2nd If only 5% of cases require it, we can make it until ~May 14th. 2.5% gets us to May 20th. This, of course, assumes that there is no uptick in demand for beds from other (non-COVID19) causes, which seems like a dubious assumption. As healthcare system becomes increasingly burdened, Rx shortages, etc, people w/ chronic conditions that are normally well-managed may find themselves slipping into severe states of medical distress requiring intensive care & hospitalization.

A community’s reaction makes all the difference

As we’ve discussed, this math isn’t a certainty—China has already shown that it’s possible to reduce the spread by taking extreme steps. Another great example of a successful response is Vietnam, where, amongst other things, a nationwide advertising campaign (including a catchy song!) quickly mobilized community response and ensured that people adjusted their behavior appropriately.

This is not just a hypothetical situation — it was clearly displayed in the 1918 flu pandemic. In the United States two cities displayed very different reactions to the pandemic: Philadelphia went ahead with a giant parade of 200,000 people to help raise money for the war. But St Louis put in place carefully designed processes to minimize social contacts so as to decrease the spread of the virus, along with cancelling all large events. Here is what the number of deaths looked like in each city, as shown in the Proceedings of the National Academy of Sciences:

Impact of differing responses to the 1918 Flu pandemic
Impact of differing responses to the 1918 Flu pandemic

The situation in Philadelphia became extremely dire, even getting to a point where there were not enough funeral caskets or morgues to handle the huge number of dead from the flu.

Richard Besser, who was acting director of the Centers for Disease Control and Prevention during the 2009 H1N1 pandemic, says that in the US “the risk of exposure and the ability to protect oneself and one’s family depends on income, access to health care, and immigration status, among other factors.” He points out that:

The elderly and disabled are at particular risk when their daily lives and support systems are disrupted. Those without easy access to health care, including rural and Native communities, might face daunting distances at times of need. People living in close quarters — whether in public housing, nursing homes, jails, shelters or even the homeless on the streets — might suffer in waves, as we have already seen in Washington state. And the vulnerabilities of the low-wage gig economy, with non-salaried workers and precarious work schedules, will be exposed for all to see during this crisis. Ask the 60 percent of the U.S. labor force that is paid hourly how easy it is to take time off in a moment of need.

The US Bureau of Labor Statistics shows that less than a third of those in the lowest income band have access to paid sick leave:

Most poor Americans do not have sick leave, so have to go to work.
Most poor Americans do not have sick leave, so have to go to work.

We don’t have good information in the US

One of the big issues in the US is that very little testing is being done, and testing results aren’t being properly shared, which means we don’t know what’s actually happening. Scott Gottlieb, the previous FDA commissioner, explained that in Seattle there has been better testing, and we are seeing infection there: “The reason why we knew early about Seattle outbreak of covid-19 was because of sentinel surveillance work by independent scientists. Such surveillance never got totally underway in other cities. So other U.S. hot spots may not be fully detected yet.” According to The Atlantic, Vice President Mike Pence promised that “roughly 1.5 million tests” would be available this week, but less than 2,000 people have been tested throughout the US at this point. Drawing on work from The COVID Tracking Project, Robinson Meyer and Alexis Madrigal of The Atlantic, said:

The figures we gathered suggest that the American response to the covid-19 and the disease it causes, COVID-19, has been shockingly sluggish, especially compared with that of other developed countries. The CDC confirmed eight days ago that the virus was in community transmission in the United States—that it was infecting Americans who had neither traveled abroad nor were in contact with others who had. In South Korea, more than 66,650 people were tested within a week of its first case of community transmission, and it quickly became able to test 10,000 people a day.

Part of the problem is that this has become a political issue. In particular, President Donald Trump has made it clear that he wants to see “the numbers” (that as, the number of people infected in the US) kept low. This is an example of where optimizing metrics interferes with getting good results in practice. (For more on this issue, see the Ethics of Data Science paper The Problem with Metrics is a Fundamental Problem for AI). Google’s Head of AI Jeff Dean, tweeted his concern about the problems of politicized disinformation:

When I worked at WHO, I was part of the Global Programme on AIDS (now UNAIDS), created to help the world tackle the HIV/AIDS pandemic. The staff there were dedicated doctors and scientists intensely focused on helping address that crisis. In times of crisis, clear and accurate information is vital to helping everyone make proper and informed decisions about how to respond (country, state, and local governments, companies, NGOs, schools, families, and individuals). With the right information and policies in place for listening to the best medical and scientific experts, we will all come through challenges like the ones presented by HIV/AIDS or by COVID-19. With disinformation driven by political interests, there’s a real risk of making things way, way worse by not acting quickly and decisively in the face of a growing pandemic, and by actively encouraging behaviors that will actually spread the disease more quickly. This whole situation is incredibly painful to watch unfold.

It doesn’t look like there is the political will to turn things around, when it comes to transparency. Health and Human Services Secretary Alex Azar, according to Wired, “started talking about the tests health care workers use to determine if someone is infected with the new coronavirus. The lack of those kits has meant a dangerous lack of epidemiological information about the spread and severity of the disease in the US, exacerbated by opacity on the part of the government. Azar tried to say that more tests were on the way, pending quality control.” But, they continued:

Then Trump cut Azar off. “But I think, importantly, anybody, right now and yesterday, that needs a test gets a test. They’re there, they have the tests, and the tests are beautiful. Anybody that needs a test gets a test,” Trump said. This is untrue. Vice President Pence told reporters Thursday that the US didn’t have enough test kits to meet demand.

Other countries are reacting much more quickly and significantly than the US. Many countries in SE Asia are showing great results, including Taiwan, where R0 is down to 0.3 now, and Singapore, which is being proposed as The Model for COVID-19 Response. It’s not just in Asia though; in France, for instance, any gathering of >1000 people is forbidden, and schools are now closed in three districts.

In conclusion

Covid-19 is a significant societal issue, and we can, and should, all work to decrease the spread of the disease. This means:

  • Avoiding large groups and crowds
  • Canceling events
  • Working from home, if at all possible
  • Washing hands when coming and going from home, and frequently when out
  • Avoiding touching your face, especially when outside your home.

Note: due to the urgency of getting this out, we haven’t been as careful as we normally like to be about citing and crediting the work we’re relying on. Please let us know if we’ve missed anything.

Thanks to Sylvain Gugger and Alexis Gallagher for feedback and comments.


(Click ↩ on a footnote to go back to where you were.)

  1. Epidemiologists are people who study the spread of disease. It turns out that estimating things like mortality and R0 are actually pretty challenging, so there is a whole field that specializes in doing this well. Be wary of people who use simple ratios and statistics to tell you how covid-19 is behaving. Instead, look at modeling done by epidemiologists. 

  2. Well, not technically true. “R0” strictly speaking refers to the infection rate in the absence of response. But since that’s not really ever the thing that we care about, we’ll let ourselves be a bit sloppy on our definitions here. 

  3. Since that decision, we’ve worked hard to find a way to run a virtual course which we hope will be even better than the in-person version would have been. We’ve been able to open it up to anyone in the world, and will be running virtual study and project groups every day. 

  4. We’ve made many other smaller changes to our lifestyle too, including exercising at home instead of going to the gym, moving all our meetings to video-conference, and skipping night events that we’d been looking forward to. 

Disinformation: what it is, why it's pervasive, and proposed regulations

The next two videos from the the University of San Francisco Center for Applied Data Ethics Tech Policy Workshop are available! Read more below, or watch them now:

Renee DiResta and Guillaume Chaslot are experts on disinformation who spoke at the CADE Tech Policy Workshop
Renee DiResta and Guillaume Chaslot are experts on disinformation who spoke at the CADE Tech Policy Workshop

(Dis)Information & Regulation

Renee DiResta shares a framework for evaluating disinformation campaigns, explains the dynamics of why and how disinformation and propaganda spread, and surveys proposed regulatory approaches to address these issues. She shares regulatory proposals around ads, antitrust, and privacy, and how these proposed laws impact the privacy-security-free expression balance. Disinformation is an ecosystem level problem, not a software feature level problem, so policy making needs to be agile and to address the broader ecosystem.

Renée DiResta is the technical research manager at Stanford Internet Observatory. She investigates the spread of malign narratives across social networks, and assists policymakers in devising responses to the problem. Renee has studied influence operations and computational propaganda in the context of pseudoscience conspiracies, terrorist activity, and state-sponsored information warfare, and has advised Congress, the State Department, and other academic, civil society, and business organizations on the topic. At the behest of the Senate Select Committee on Intelligence, she led one of the two research teams that produced comprehensive assessments of the Internet Research Agency’s and GRU’s influence operations targeting the U.S. from 2014-2018.

Watch her talk here:

Read more about Renee’s work in these selected articles and essays:

The Toxic Potential of YouTube’s Feedback Loop

Systemic factors contribute to the proliferation and amplification of conspiracy theories on platforms such as YouTube. The emphasis on metrics, cheap cost of experimentation, and potential for rewards incentivize propagandists to game recommendation system. The process of flagging and removing harmful content is much slower than the virality with which videos spread. The situation is even worse for languages other than English, where tech platforms tend to not invest many resources. For example, major concerns were raised in France about YouTube promoting pedophilia in 2006 and 2017, yet YouTube failed to take action until 2019 when it became a news topic in the USA after a high-profile New York Times Article and major American companies pulling their ads.

Guillaume Chaslot earned his PhD in AI working on the computer players of Go, worked at Google on YouTube’s recommendation system several years ago, and has since run the non-profit AlgoTransparency, quantitatively tracking the way that YouTube recommends conspiracy theories. His work has been covered in the Washington Post, The Guardian, the Wall Street Journal, and more. Watch his talk here:

Read more about Guillaume’s work in these selected articles and essays:

Learn More About the CADE Tech Policy Workshop

Special thanks to Nalini Bharatula for her help with this post.

fastai—A Layered API for Deep Learning

This paper is about fastai v2. There is a PDF version of this paper available on arXiv; it has been peer reviewed and will be appearing in the open access journal Information. fastai v2 is currently in pre-release; we expect to release it officially around July 2020. The pre-release is feature complete, although the documentation isn't complete. There is a dedicated forum available for discussing fastai v2. Jeremy will be teaching a course about deep learning with fastai v2 in San Francisco starting in March 2020. An O'Reilly book by us (Jeremy and Sylvain) about deep learning with fastai and PyTorch is available for pre-order, for expected delivery July 2020.

Abstract: fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes:

  • A new type dispatch system for Python along with a semantic type hierarchy for tensors
  • A GPU-optimized computer vision library which can be extended in pure Python
  • An optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4-5 lines of code
  • A novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training
  • A new data block API
  • ...and much more.

We have used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching.



fastai is a modern deep learning library, available from GitHub as open source under the Apache 2 license, which can be installed directly using the conda or pip package managers. It includes complete documentation and tutorials, and is the subject of the book Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD (Howard and Gugger 2020).

fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. Other libraries have tended to force a choice between conciseness and speed of development, or flexibility and expressivity, but not both. We wanted to get the clarity and development speed of Keras (Chollet and others 2015) and the customizability of PyTorch. This goal of getting the best of both worlds has motivated the design of a layered architecture. A high-level API powers ready-to-use functions to train models in various applications, offering customizable models with sensible defaults. It is built on top of a hierarchy of lower level APIs which provide composable building blocks. This way, a user wanting to rewrite part of the high-level API or add particular behavior to suit their needs doesn’t have to learn how to use the lowest level.

The layered API from fastai

The high-level of the API is most likely to be useful to beginners and to practitioners who are mainly in interested in applying pre-existing deep learning methods. It offers concise APIs over four main application areas: vision, text, tabular and time-series analysis, and collaborative filtering. These APIs choose intelligent default values and behaviors based on all available information. For instance, fastai provides a single Learner class which brings together architecture, optimizer, and data, and automatically chooses an appropriate loss function where possible. Integrating these concerns into a single class enables fastai to curate appropriate default choices. To give another example, generally a training set should be shuffled, and a validation does not need to be. So fastai provides a single DataLoaders class which automatically constructs validation and training data loaders with these details already handled. This helps practitioners ensure that they don’t make mistakes such as failing to include a validation set. In addition, because the training set and validation set are integrated into a single class, fastai is able, by default, always to display metrics during training using the validation set.

This use of intelligent defaults–based on our own experience or best practices–extends to incorporating state-of-the-art research wherever possible. For instance, transfer learning is critically important for training models quickly, accurately, and cheaply, but the details matter a great deal. fastai automatically provides transfer learning optimised batch-normalization (Ioffe and Szegedy 2015) training, layer freezing, and discriminative learning rates (Howard and Ruder 2018). In general, the library’s use of integrated defaults means it requires fewer lines of code from the user to re-specify information or merely to connect components. As a result, every line of user code tends to be more likely to be meaningful, and easier to read.

The mid-level API provides the core deep learning and data-processing methods for each of these applications, and low-level APIs provide a library of optimized primitives and functional and object-oriented foundations, which allows the mid-level to be developed and customised. The library itself is built on top of PyTorch (Paszke et al. 2017), NumPy (Oliphant, n.d.), PIL (Clark and Contributors, n.d.), pandas (McKinney 2010), and various other libraries. In order to achieve its goal of hackability, the library does not aim to supplant or hide these lower levels or these foundation. Within a fastai model, one can interact directly with the underlying PyTorch primitives; and within a PyTorch model, one can incrementally adopt components from the fastai library as conveniences rather than as an integrated package.

We believe fastai meets its design goals. A user can create and train a state-of-the-art vision model using transfer learning with four understandable lines of code. Perhaps more tellingly, we have been able to implement recent deep learning research papers with just a couple of hours work, whilst matching the performance shown in the papers. We have also used the library for our winning entry in the DawnBench competition (Cody A. Coleman and Zahari 2017), training a ResNet-50 on ImageNet to accuracy in 18 minutes.

The following sections describe the main functionality of the various API levels in more detail and review prior related work. We chose to include a lot of code to illustrate the concepts we are presenting. While that code made change slightly as the library or its dependencies evolve (it is running against fastai v2.0.0), the ideas behind stay the same. The next section reviews the high-level APIs "out-of-the-box" applications for some of the most used deep learning domains. The applications provided are vision, text, tabular, and collaborative filtering.



Here is an example of how to fine tune an ImageNet (Deng et al. 2009) model on the Oxford IIT Pets dataset (Parkhi et al. 2012) and achieve close to state-of-the-art accuracy within a couple of minutes of training on a single GPU:

from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_re(path=path, bs=64,
    fnames = get_image_files(path/"images"), pat = r'/([^/]+)_\d+.jpg$',
    item_tfms=RandomResizedCrop(450, min_scale=0.75), 
    batch_tfms=[*aug_transforms(size=224, max_warp=0.), Normalize.from_stats(*imagenet_stats)])
learn = cnn_learner(dls, resnet34, metrics=error_rate)

This is not an excerpt; these are all of the lines of code necessary for this task. Each line of code does one important task, allowing the user to focus on what they need to do, rather than minor details. Let’s look at each line in turn:

from fastai.vision.all import *

This first line imports all the necessary pieces from the library. fastai is designed to be usable in a read–eval–print loop (REPL) environment as well as in a complex software system. Even if using the "import *" syntax is not generally recommended, REPL programmers generally prefer the symbols they need to be directly available to them, which is why fastai supports the "import *" style. The library is carefully designed to ensure that importing in this way only imports the symbols that are actually likely to be useful to the user and avoids cluttering the namespace or shadowing important symbols.

path = untar_data(URLs.PETS)

The second line downloads a standard dataset from the fast.ai datasets collection (if not previously downloaded) to a configurable location (~/.fastai/data by default), extracts it (if not previously extracted), and returns a pathlib.Path object with the extracted location.

dls = ImageDataLoaders.from_name_re(...)

This line sets up the DataLoaders object. This is an abstraction that represents a combination of training and validation data and will be described more in a later section. DataLoaders can be flexibly defined using the data block API (see 3.1), or, as here, can be built for specific predefined applications using specific subclasses. In this case, the ImageDataLoaders subclass is created using a regular expression labeller. Many other labellers are provided, particularly focused on labelling based on different kinds of file and folder name patterns, which are very common across a wide range of datasets.

One interesting feature of this API, which is also shared by lower level fastai data APIs, is the separation of item level and batch level transforms. Item transforms are applied, in this case, to individual images on the CPU. Batch transforms, on the other hand, are applied to a mini batch, on the GPU if available. While fastai supports data augmentation on the GPU, images need to be of the same size before being batched. aug_transforms() selects a set of data augmentations that work well across a variety of vision datasets and problems and can be fully customized by providing parameters to the function. This is a good example of a simple "helper function"; it is not strictly necessary, because the user can list all the augmentations that they require using the individual data augmentation classes. However, by providing a single function which curates best practices and makes the most common types of customization available through a single function, users have fewer pieces to learn in order to get good results.

After defining a DataLoaders object the user can easily look at the data with a single line of code:

A DataLoaders object built with the fastai library knows how to show its elements in a meaningful way. Here the result on the Oxford IIT Pets image classification dataset.
learn = cnn_learner(dls, resnet34, metrics=error_rate)

This fourth line creates a Learner, which provides an abstraction combining an optimizer, a model, and the data to train it – this will be described in more detail in 4.1. Each application has a customized function that creates a Learner, which will automatically handle whatever details it can for the user. For instance, in this case it will download an ImageNet-pretrained model, if not already available, remove the classification head of the model, replace it with a head appropriate for this particular dataset, and set appropriate defaults for the optimizer, weight decay, learning rate, and so forth (except where overridden by the user).


The fifth line fits the model. In this case, it is using the 1cycle policy (Smith 2018), which is a recent best practice for training and is not widely available in most deep learning libraries by default. It is annealing both the learning rates, and the momentums, printing metrics on the validation set, displaying results in an HTML table (if run in a Jupyter Notebook, or a console table otherwise), recording losses and metrics after every batch to allow plotting later, and so forth. A GPU will be used if one is available.

After training a model the user can view the results in various ways, including analysing the errors with show_results():

A Learner knows from the data and the model type how to represent the results. It can even highlight model errors (here predicted class at bottom and actual at top).

Here is another example of a vision application, this time for segmentation on the CamVid dataset (Brostow et al. 2008):

from fastai.vision.all import *
path = untar_data(URLs.CAMVID)
dls = SegmentationDataLoaders.from_label_func(path=path, bs=8,
    fnames = get_image_files(path/"images"), 
    label_func = lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
    codes = np.loadtxt(path/'codes.txt', dtype=str),                         
    batch_tfms=[*aug_transforms(size=(360,480)), Normalize.from_stats(*imagenet_stats)])
learn = unet_learner(dls, resnet34, metrics=acc_segment)
learn.fit_one_cycle(8, pct_start=0.9)

The lines of code to create and train this model are almost identical to those for a classification model, except for those necessary to tell fastai about the differences in the processing of the input data. The exact same line of code that was used for the image classification example can also be used to display the segmentation data:

In this case, fastai knows that the data is for a segmentation task, and therefore it color-codes and overlays, with transparency, the segmentation layer on top of the input images.

Furthermore, the user can also view the results of the model, which again are visualized automatically in a way suitable for this task:

For a segmentation task, the ground-truth mask is laid at the right of the predicted mask.


In modern natural language processing (NLP), perhaps the most important approach to building models is through fine tuning pre-trained language models. To train a language model in fastai requires very similar code to the previous examples (here on the IMDb dataset (Maas et al. 2011)):

from fastai.text.all import *
path = untar_data(URLs.IMDB_SAMPLE)
df_tok,count = tokenize_df(pd.read_csv(path/'texts.csv'), ['text'])
dls_lm = TextDataLoaders.from_df(df_tok, path=path,
    vocab=make_vocab(count), text_col='text', is_lm=True)
learn = language_model_learner(dls_lm, AWD_LSTM, metrics=Perplexity()])
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7,0.8))

Fine-tuning this model for classification requires the same basic steps:

dls_clas = TextDataLoaders.from_df(df_tok, path=path
    vocab=make_vocab(count), text_col='text', label_col='label')
learn = text_classifier_learner(dls_clas, AWD_LSTM, metrics=accuracy)
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7,0.8))

The same API is also used to view the DataLoaders:

In text classification, the batches are shown in a DataFrame with the tokenized texts.

The biggest challenge with creating text applications is often the processing of the input data. fastai provides a flexible processing pipeline with predefined rules for best practices, such as handling capitalization by adding tokens. For instance, there is a compromise between lower-casing all text and losing information, versus keeping the original capitalisation and ending up with too many tokens in your vocabulary. fastai handles this by adding a special single token representing that the next symbol should be treated as uppercase or sentence case and then converts the text itself to lowercase. fastai uses a number of these special tokens. Another example is that a sequence of more than three repeated characters is replaced with a special repetition token, along with a number of repetitions and then the repeated character. These rules largely replicate the approaches discussed in (Howard and Ruder 2018) and are not normally made available as defaults in most NLP modelling libraries.

The tokenization is flexible and can support many different organizers. The default used is Spacy. A SentencePiece tokenizer (Kudo and Richardson 2018) is also provided by the library. Subword tokenization (Wu et al. 2016) (Kudo 2018), such as that provided by SentencePiece, has been used in many recent NLP breakthroughs (Radford et al. 2019) (Devlin et al. 2018).

Numericalization and vocabulary creation often requires many lines of code, and careful management here fails and caching. In fastai that is handled transparently and automatically. Input data can be provided in many different forms, including: a separate file on disk for each document, delimited files in various formats, and so forth. The API also allows for complete customisation. SentencePiece is particularly useful for handling multiple languages and was used in MultiFIT (Eisenschlos et al. 2019), along with fastai, for this purpose. This provided models and state-of-the-art results across many different languages using a single code base.

fastai’s text models are based on AWD-LSTM (Merity, Keskar, and Socher 2017). The user community have provided external connectors to the popular HuggingFace Transformers library (Wolf et al. 2019). The training of the models proceeds in the same way as for the vision examples with defaults appropriate for these models automatically selected. We are not aware of other libraries that provide direct support for transfer learning best practices in NLP, such as those shown in (Howard and Ruder 2018). Because the tokenisation is built on top of a layered architecture, users can replace the base tokeniser with their own choices and will automatically get support for the underlying parallel process model provided by fastai. It will also automatically handle serialization of intermediate outputs so that they can be reused in future processing pipelines.

The results of training the model can be visualised with the same API as used for image models, shown in a way appropriate for NLP:

In text classification, results are displayed in a DataFrame with the tokenized texts.


Tabular models have not been very widely used in deep learning; Gradient boosting machines and similar methods are more commonly used in industry and research settings. However, there have been examples of competition winning approaches and academic state-of-the-art results using deep learning (Brébisson et al. 2015). Deep learning models are particularly useful for datasets with high cardinality categorical variables because they provide embeddings that can be used even for non-deep learning models (Guo and Berkhahn 2016). One of the challenges is there has not been examples of libraries which directly support best practices for tabular modelling using deep learning.

The pandas library (McKinney 2010) already provides excellent support for processing tabular data sets, and fastai does not attempt to replace it. Instead, it adds additional functionality to pandas DataFrames through various pre-processing functions, such as automatically adding features that are useful for modelling with date data. fastai also provides features for automatically creating appropriate DataLoaders with separated validation and training sets, using a variety of mechanisms, such as randomly splitting rows, or selecting rows based on some column.

The code to create and train a model suitable for this data should look familiar, there is just information specific to tabular data requires when building the DataLoaders object.

from fastai2.tabular.all import *
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
dls = TabularDataLoaders.from_df(df, path, 
    procs=[Categorify, FillMissing, Normalize],
    cat_names=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race'], 
    cont_names=['age', 'fnlwgt', 'education-num'],
    y_names='salary', valid_idx=list(range(1024,1260)), bs=64)
learn = tabular_learner(dls, layers=[200,100], metrics=accuracy)

As for every other application, dls.show_batch and learn.show_results will display a DataFrame with some samples.

fastai also integrates with NVIDIA’s cuDF library, providing end-to-end GPU optimized data processing and model training. fastai is the first deep learning framework to integrate with cuDF in this way.

Collaborative filtering

Collaborative filtering is normally modelled using a probabilistic matrix factorisation approach (Mnih and Salakhutdinov 2008). In practice however, a dataset generally has much more than just (for example) a user ID and a product ID, but instead has many characteristics of the user, product, time period, and so forth. It is quite standard to use those to train a model, therefore, fastai attempts to close the gap between collaborative filtering and tabular modelling. A collaborative filtering model in fastai can be simply seen as a tabular model with high cardinality categorical variables. A classic matrix factorisation model is also provided. Both are trained using the same steps that we’ve seen in the other applications, as in this example using the popular Movielens dataset (Harper and Konstan 2015):

from fastai2.collab import *
ratings = pd.read_csv(untar_data(URLs.ML_SAMPLE)/'ratings.csv')
dls = CollabDataLoaders.from_df(ratings, bs=64, seed=42)
learn = collab_learner(dls, n_factors=50, y_range=[0, 5.5])


fastai is mostly focused on model training, but once this is done you can easily export the PyTorch model to serve it in production. The command Learner.export will serialize the model as well as the input pipeline (just the transforms, not the training data) to be able to apply the same to new data.

The library provides Learner.predict and Learner.get_preds to evaluate the model on on item or a new inference DataLoader. Such a DataLoader can easily be built from a set of items with the command test_dl.

High-level API design considerations

High-level API foundations

The high-level API is that which is used by people using these applications. All the fastai applications share some basic components. One such component is the visualisation API, which uses a small number of methods, the main ones being show_batch (for showing input data) and show_results (for showing model results). Different types of model and datasets are able to use this consistent API because of fastai’s type dispatch system, a lower-level component which will be discussed in 5.3. The transfer learning capability shared across the applications relies on PyTorch’s parameter groups, and fastai’s mid-level API then leverages these groups, such as the generic optimizer (see 4.3).

In all those applications, the Learner obtained gets the same functionality for the model training. The recommended way of training models using a variant of the 1cycle policy (Smith 2018) which uses a warm-up and annealing for the learning rate while doing the opposite with the momentum parameter:

The hyper-parameters schedule in the 1cycle policy.

The learning rate is the most important hyper-parameter to tune (and very often the only one since the library sets proper defaults). Other libraries often provide help for grid search or AutoML to guess the best value, but the fastai library implements the learning rate finder (Smith 2015) which much more quickly provides the best value for this parameter after a mock training. The command learn.lr_find() will return a graph like this:

The learning rate finder does a mock training with an exponentially growing learning rate over 100 iterations. A good value is then the minimum value on the graph divided by 10.

Another important high-level API component, which is shared across all of the applications, is the data block API. The data block API is an expressive API for data loading. It is the first attempt we are aware of to systematically define all of the steps necessary to prepare data for a deep learning model, and give users a mix and match recipe book for combining these pieces (which we refer to as data blocks). The steps that are defined by the data block API are:

  • Getting the source items,
  • Splitting the items into the training set and one or more validation sets,
  • Labelling the items,
  • Processing the items (such as normalization), and
  • Optionally collating the items into batches.

Here is an example of how to use the data block API to get the MNIST dataset (LeCun, Cortes, and Burges 2010) ready for modelling:

mnist = DataBlock(
    blocks=(ImageBlock(cls=PILImageBW), CategoryBlock), 
dls = mnist.databunch(untar_data(URLs.MNIST_TINY), batch_tfms=Normalize)

In fastai v1 and earlier we used a fluent instead of a functional API for this (meaning the statements to execute those steps were chained one after the other). We discovered that this was a mistake; while fluent APIs are flexible in the order in which the user can define the steps, that order is very important in practice. With this functional DataBlock you don’t have to remember if you need to split before or after labelling your data, for instance. Also, fluent APIs, at least in Python, tend not to work well with auto completion technologies. The data processing can be defined using Transforms (see 5.2). Here is an example of using the data blocks API to complete the same segmentation seen earlier:

path = untar_data(URLs.CAMVID_TINY)
camvid = DataBlock(blocks=(ImageBlock, ImageBlock(cls=PILMask)),
    get_y=lambda o: path/'labels'/f'{o.stem}_P{o.suffix}')
dls = camvid.databunch(path/"images",
    batch_tfms=[*aug_transforms(), Normalize.from_stats(*imagenet_stats)])

Object detection can also be completed using the same functionality (here using the COCO dataset (Lin et al. 2014)):

coco_source = untar_data(URLs.COCO_TINY)
images, lbl_bbox = get_annotations(coco_source/'train.json')
lbl = dict(zip(images, lbl_bbox))

coco = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
    getters=[noop, lambda o:lbl[o.name][0], lambda o:lbl[o.name][1]], 
dls = coco.databunch(coco_source, item_tfms=Resize(128),
    batch_tfms=[*aug_transforms(), Normalize.from_stats(*imagenet_stats)])

In this case, the targets are a tuple of two things: a list of bounding boxes and a list of labels. This is why there are three blocks, a list of getters and an extra argument to specify how many of the blocks should be considered the input (the rest forming the target).

The data for language modeling seen earlier can also be built using the data blocks API:

df = pd.read_csv(untar_data(URLs.IMDB_SAMPLE)/'texts.csv')
df_tok,count = tokenize_df(df, 'text')
imdb_lm = DataBlock(blocks=TextBlock(make_vocab(count),is_lm=True),
dls = imdb_lm.databunch(df_tok, bs=64, seq_len=72)

We have heard from users that they find the data blocks API provides a good balance of conciseness and expressivity. Many libraries have provided various approaches to data processing. In the data science domain the scikit-learn (Pedregosa et al. 2011) pipeline approach is widely used. This API provides a very high level of expressivity, but is not opinionated enough to ensure that a user completes all of the steps necessary to get their data ready for modelling. As another example, TensorFlow (Abadi et al. 2015) provides the tf.data library, which does not as precisely map the steps necessary for users to complete their task to the functionality provided by the API. The Torchvision (Massa and Chintala, n.d.) library is a good example of an API which is highly specialised to a small subset of data processing tasks for a specific subdomain. fastai tries to capture the benefits of both extremes of the spectrum, without compromises; the data blocks API is how most users transform their data for use with the library.

Incrementally adapting PyTorch code

Users often need to use existing pure PyTorch code (i.e. code that doesn’t use fastai), such as their existing code-bases developed without fastai, or using third party code written in pure PyTorch. fastai supports incrementally adding fastai features to this code, without requiring extensive rewrites.

For instance, at the time of writing, the official PyTorch repository includes a MNIST training example1. In order to train this example using fastai’s Learner only two steps are required. First, the 30 lines in the example covering the test() and train() functions can be removed. Then, the 4 lines of the training loop is replaced with this code:

data = DataLoaders(train_loader, test_loader).cuda()
learn = Learner(data, Net(), loss_func=F.nll_loss, opt_func=Adam, metrics=accuracy)
learn.fit_one_cycle(epochs, lr)

With no other changes, the user now has the benefit of all fastai’s callbacks, progress reporting, integrated schedulers such as 1cycle training, and so forth.

Consistency across domains

As the application examples have shown, the fastai library allows training a variety of kinds of application models, with a variety of kinds of datasets, using a very consistent API. The consistency covers not just the initial training, but also visualising and exploring the input data and model outputs. Such consistency helps students, both through having less to learn, and through showing the unifying concepts across different types of model. It also helps practitioners and researchers focus on their model development rather than learning incidental differences between APIs across domains. It is of particular benefit when, for instance, an NLP expert tries to bring their expertise across to a computer vision application.

There are many libraries that provide high level APIs to specific applications, such as Facebook’s Torchvision (Massa and Chintala, n.d.), Detectron (Girshick et al. 2018), and Fairseq (Ott et al. 2019). However, each library has a different API, input representation, and requires different assumptions about training details, all of which a user must learn from scratch each time. This means that there are many deep learning practitioners and researchers who become specialists in specific subfields, partially based on their understanding of the toiling of those subfields. By providing a consistent API fastai users are able to quickly move between different fields and reuse their expertise.

Customizing the behaviour of predefined applications can be challenging, which means that researchers often end up "reinventing the wheel", or, constraining themselves to the specific parts which there tooling allows them to customize. Because fastai provides a layered architecture, users of the software can customize every part, as they need. The layered architecture is also an important foundation in allowing PyTorch users to incrementally add fastai functionality to existing code bases. Furthermore, fastai’s layers are reused across all applications, so an investment in learning them can be leveraged across many different projects.

The approach of creating layered APIs has a long history in software engineering. Software engineering best practices involve building up decoupled components which can be tied together in flexible ways, and then creating increasingly less abstract and more customized layers on top of each part.

The layered API design is also important for researchers and practitioners aiming to create best in class results. As the field of deep learning matures, there are more and more architectures, optimizers, data processing pipelines, and other approaches that can be selected from. Trying to bring multiple approaches together into a single project can be extremely challenging, when each one is using a different, incompatible API, and has different expectations about how a model is trained. For instance, in the original mixup article (Zhang et al. 2017), the code provided by the researchers only works on one specific dataset, with one specific set of metrics, and with one specific optimizer. Attempting to combine the researchers’ mixup code with other training best practices, such as mixed precision training (Micikevicius et al. 2017), requires rewriting it largely from scratch. The next section will look at the mid-level API pieces that fastai provides, which can be mixed and matched together to allow custom approaches to be quickly and reliably created.

Mid-level APIs

Many libraries, including fastai version 1 or earlier, provide a high-level API to users, and a low-level API used internally for that functionality, but nothing in between. This has two problems: the first is that it becomes harder and harder to create additional high-level functionality, as the system becomes more sophisticated, because the low-level API becomes increasingly complicated and cluttered. The second problem is that for users of the system who want to customize and adapt it, they often have to rewrite significant parts of the high-level API, and understand the large surface area of the low-level API in order to do so. This tends to mean that only a small dedicated community of specialists can really customize the software.

These issues are common across nearly all software development, and many software engineers have worked hard to find ways to deal with this complexity and develop layered architectures. The issue in the deep learning community, however, is that these practices have not seemed to be widely understood or adopted. There are, however, exceptions; most relevant to this paper, the PyTorch library (Paszke et al. 2017) has a carefully layered design and is highly customizable.

Much of the innovation in fastai is in its new mid-level APIs. This section will look at the following mid-level APIs: data, callbacks, optimizer, model layers, and metrics. These APIs are what the four fastai applications are built with and are also fully documented and available to users so that they can build their own applications or customize the existing ones.


As already noted, a library can provide more appropriate defaults and user-friendly behaviour by ensuring that classes have all the information they need to make appropriate choices. One example of this is the DataLoaders class, which brings together all the information necessary for creating the data required for modelling. fastai also provides the Learner class, which brings together all the information necessary for training a model based on the data. The information which Learner requires, and is stored as state within a learner object, is: a PyTorch model, and optimizer, a loss function, and a DataLoaders object. Passing in the optimizer and loss function is optional, and in many situations fastai can automatically select appropriate defaults.

Learner is also responsible (along with Optimizer) for handling fastai’s transfer learning functionality. When creating a Learner the user can pass a splitter. This is a function that describes how to split the layers of a model into PyTorch parameter groups, which can then be frozen, trained with different learning rates, or more generally handled differently by an optimizer.

One area that we have found particularly sensitive in transfer learning is the handling of batch-normalization layers (Ioffe and Szegedy 2015). We tried a wide variety of approaches to training and updating the moving average statistics of those layers, and different configurations could often change the error rate by as much as 300%. There was only one approach that consistently worked well across all datasets that we tried, which is to never freeze batch-normalization layers, and never turn off the updating of their moving average statistics. Therefore, by default, Learner will bypass batch-normalization layers when a user asks to freeze some parameter groups. Users often report that this one minor tweak dramatically improves their model accuracy and is not something that is found in any other libraries that we are aware of.

DataLoaders and Learner also work together to ensure that model weights and input data are all on the same device. This makes working with GPUs significantly more straightforward and makes it easy to switch from CPU to GPU as needed.

Two-way callbacks

In fastai version 0.7, we repeatedly modified the training loop in Learner to support many different tweaks and customizations. Over time, however, this became unwieldy. We noticed that there was a core subset of functionality that appeared in every one of these tweaks, and that all the other changes that were required could be refactored into a specific set of customization points. In other words, a wide variety of training methods can be represented using a single, universal training system. Once we extracted those common pieces, we were left with the basic fastai training loop, and the customisation points that we call two-way callbacks.

The Learner class’s novel 2-way callback system allows gradients, data, losses, control flow, and anything else to be read and changed at any point during training. There is a rich history of using callbacks to allow for customisation of numeric software, and today nearly all modern deep learning libraries provide this functionality. However, fastai’s callback system is the first that we are aware of that supports the design principals necessary for complete two-way callbacks:

  • A callback should be available at every single point that code can be run during training, so that a user can customise every single detail of the training method ;
  • Every callback should be able to access every piece of information available at that stage in the training loop, including hyper-parameters, losses, gradients, input and target data, and so forth ;

This is the way callbacks are usually designed, but in addition, there is a key design principal:

  • Every callback should be able to modify all these pieces of information, at any time before they are used, and be able to skip a batch, epoch, training or validation section, or cancel the whole training loop.

This is why we call these 2-way callbacks, as the information not only flows from the training loop to the callbacks, but on the other way as well. For instance, here is the code for training a single batch b in fastai:

  self._split(b);                                 self.cb('begin_batch')
  self.pred = self.model(*self.x);                self.cb('after_pred')
  if len(self.y) == 0: return
  self.loss = self.loss_func(self.pred, *self.y); self.cb('after_loss')
  if not self.training: return
  self.loss.backward();                           self.cb('after_back')
  self.opt.step();                                self.cb('after_step')
except CancelBatchException:                      self.cb('after_cancel')
finally:                                          self.cb('after_batch')

This example clearly shows how every step of the process is associated with a callback (the calls to self.cb() and shows how exceptions are used as a flexible control flow mechanism for them.

In fastai v0.7, we did not follow these three design principles. As a result, we had to frequently change the training loop to support additional functionality, and new research papers. On the other hand, with this new callback system we have not had to change the training loop at all, and have used callbacks to implement mixup augmentation, generative adversarial networks, optimized mixed precision training, PyTorch hooks, the learning rate finder, and many more. Most importantly, we have not yet come across any cases where mixing and matching these callbacks has caused any problems. Therefore, users can use all the training features that they want, and can easily do ablation studies, adding, removing, and modifying techniques as needed.

Case study: generative adversarial network training using callbacks

A good example of a fastai callback is GANTrainer, which implements training of generative adversarial networks (Goodfellow et al. 2014). To do so, it must complete the following tasks:

  • Freeze the generator and train the critic for one (or more) step by:
    • getting one batch of "real" images ;
    • generating one batch of "fake" images;
    • have the critic evaluate each batch and compute a loss function from that, which rewards positively the detection of real images and penalizes the fake ones;
    • update the weights of the critic with the gradients of this loss.
  • Freeze the critic and train the generator for one (or more) step by:
    • generating one batch of "fake" images;
    • evaluate the critic on it;
    • return a loss that rewards positively the critic thinking those are real images;
    • update the weights of the generator with the gradients of this loss.

To do so, it relies on a GANModule that contains the generator and the critic, then delegates the input to the proper model depending on the value of a flag gen_mode and on a GANLoss that also has a generator or critic behavior and handles the evaluation mentioned earlier. Then, it defines the following callback methods:

  • begin_fit: Initialises the generator, critic, loss functions, and internal storage
  • begin_epoch: Sets the critic or generator to training mode
  • begin_validate: Switches to generator mode for showing results
  • begin_batch: Sets the appropriate target depending on whether it is in generator or critic mode
  • after_batch: Records losses to the generator or critic log
  • after_epoch: Optionally shows a sample image

This callback is then customized with another callback, which defines at what point to switch from critic to generator and vice versa. fastai includes several possibilities for this purpose, such as an AdaptiveGANSwitcher, which automatically switches between generator and critic training based on reaching certain thresholds in their respective losses. This approach to training can allow models to be trained significantly faster and more easily than with standard fixed schedule approaches.

Generic optimizer

fastai provides a new generic optimizer foundation that allows recent optimization techniques to be implemented in a handful of lines of code, by refactoring out the common functionality of modern optimizers into two basic pieces:

  • stats, which track and aggregate statistics such as gradient moving averages ;
  • steppers, which combine stats and hyper-parameters to update the weights using some function.

This has allowed us to implement every optimizer that we have attempted in fastai, without needing to extend or change this foundation. This has been very beneficial, both for research and development. As an example of a development improvement, here are the entire changes needed to make to support decoupled weight decay (also known as AdamW (Loshchilov and Hutter 2017)):

steppers = [weight_decay] if decouple_wd else [l2_reg]

On the other hand, the implementation in the PyTorch library required creating an entirely new class, with over 50 lines of code. The benefit for research comes about because it it easy to rapidly implement new papers as they come out, recognise similarities and differences across techniques, and try out variants and combinations of these underlying differences, many of which have not yet been published. The resulting code tends to look a lot like the maths shown in the paper. For instance, here is the code in fastai, and the algorithm from the paper, for the LAMB optimizer (You et al. 2019):

The LAMB algorithm and implementation.

The only difference between the code and the figure are:

  • the means that update mt and vt don’t appear as this done in a separate function stat;
  • the authors do not provide the full definition of the ϕ function they use (it depends on undefined parameters), the code below is based on the official TensorFlow implementation.

In order to support modern optimizers such as LARS fastai allows the user to choose whether to aggregate stats at model, layer, or per activation level.

Generalized metric API

Nearly all machine learning and deep learning libraries provide some support for metrics. These are generally defined as simple functions which take the mean, or in some cases a custom reduction function, across some measurement which is logged during training. However, some metrics cannot be correctly defined using this framework. For instance, the dice coefficient, which is widely used for measuring segmentation accuracy, cannot be directly expressed using a simple reduction.

In order to provide a more flexible foundation to support metrics like this fastai provides a Metric abstract class which defines three methods: reset, accumulate, and value (which is a property). Reset is called at the start of training, accumulate is called after each batch, and then finally value is called to calculate the final check. Whenever possible, we can thus avoid recording and storing all predictions in memory. For instance, here is the definition of the dice coefficient:

class Dice(Metric):
    def __init__(self, axis=1): self.axis = axis
    def reset(self): self.inter,self.union = 0,0
    def accumulate(self, learn):
        pred,targ = flatten_check(learn.pred.argmax(self.axis), learn.y)
        self.inter += (pred*targ).float().sum().item()
        self.union += (pred+targ).float().sum().item()

    def value(self): return 2. * self.inter/self.union if self.union>0 else None

The Scikit-learn library (Pedregosa et al. 2011) already provides a wide variety of useful metrics, so instead of reinventing them, fastai provides a simple wrapper function, skm_to_fastai, which allows them to be used in fastai, and can automatically add pre-processing steps such as sigmoid, argmax, and thresholding.


Many libraries have recently started integrating access to external datasets directly into their APIs. fastai builds on this trend, by curating and collecting a number of datasets (hosted by the AWS Public Dataset Program2) in a single place and making them available through the fastai.data.external module. fastai automatically downloads, extracts, and caches these datasets when they are first used. This is similar to functionality provided by Torchvision, TensorFlow datasets, and similar libraries, with the addition of closer integration into the fastai ecosystem. For instance, fastai provides cut-down “sample” versions of many of its datasets, which are small enough that they can be downloaded and used directly in documentation, continuous integration testing, and so forth. These datasets are also used in the documentation, along with examples showing users what they can expect when training models with his datasets. Because the documentation is written in interactive notebooks (as discussed in a later section) this also means that users can directly experiment with these datasets and models by simply running and modifying the documentation notebooks.

funcs_kwargs and DataLoader

Once a user has their data available, they need to get it into a form that can be fed to a PyTorch model. The most common class used to feed models directly is the DataLoader class in PyTorch. This class provides fast and reliable multi-threaded data-processing execution, with several points allowing customisation. However, we have found that it is not flexible enough to conveniently do some of the tasks that we have required, such as building a DataLoader for an NLP language model. Therefore, fastai includes a new DataLoader class on top of the internal classes that PyTorch uses. This combines the benefits of the fast and reliable infrastructure provided by PyTorch with a more flexible and expressive front-end for the user.

DataLoader provides 15 extension points via customizable methods, which can be replaced by the user as required. These customizable methods represent the 15 stages of data loading that we have identified, and which fit into three broad stages: sample creation, item creation, and batch creation. In contrast, in the standard PyTorch DataLoader class only a small subset of these stages is explicitly made available for customization by the user. Unless a user’s requirements are met by this subset, the user is forced to implement their own solution from scratch. The impact of this additional customizability can be quite significant. For instance, the fastai language model DataLoader went from 90 lines of code to 30 lines of code after adopting this approach.

What makes this flexibility possible is a Python decorator that is called funcs_kwargs. This decorator creates a class in which any method can be replaced by passing a new function to the constructor, or by replacing it through subclassing. This allows users to replace any part of the logic in the DataLoader class. In order to maximise the power of this, nearly every part of the fastai DataLoader is a method with a single line of code. Therefore, virtually every design choice can be adjusted by users.

fastai also provides a transformed DataLoader called TfmdDL, which subclasses DataLoader. In TfmdDL the callbacks and customization points execute Pipelines of Transforms. Both mechanisms are described in 5.2; this section provides a brief overview here. A Transform is simply a Python function, which can also include its inverse function – that is, the function which “undoes” the transform. Transforms can be composed using the Pipeline class, which then allows the entire function composition to be inverted as well. We refer to these two directions, the forward and inverse directions of the functions, as the Transforms’ encodes and decodes methods.

TfmdDL provides the foundations for the visualisation support discussed in the application section, having the basic template for showing a batch of data. In order to do this, it needs to decode any transforms in the pipeline, which it does automatically. For instance, an integer representing a level of a category will be converted back into the string that the integer represents.


When users who need to create a new kind of block for the data blocks API, or need a level of customization that even the data blocks API doesn’t support, they can use the mid-level components that the data block API is built on. These are a small number of simple classes which combine the transform pipelines functionality of fastai with Python’s collections interface.

The most basic class is transformed list, or TfmdLists, which lazily applies a transform pipeline to a collection, whilst providing a standard Python collection interface. This is an important foundational functionality for deep learning, such as the ability to index into a collection of filenames, and on demand read an image file then apply any processing, such as data augmentation and normalization, necessary for a model. TfmdLists also provides subset functionality, which allows the user to define subsets of the data, such as those representing training and validation sets. Information about what subset an item belongs to is passed down to transforms, so that they can ensure that they do the appropriate processing – for instance, data augmentation processing would be generally skipped for a validation set, unless doing test time augmentation.

Another important data class at this layer of the API is Datasets, which applies multiple transform pipelines in parallel to a single collection. Like TfmdLists, it provides a standard Python collection interface. Indexing into a Datasets object returns a couple containing the result of each transform pipeline on the input item. This is the class used by the data blocks API to return, for instance, a tuple of an image tensor, and a label for that image, both derived from the same input filename.

Layers and architectures

PyTorch (like many other libraries) provides a basic “sequential” layer object, which can be combined in sequence to form a component of a network. This represents simple composition of functions, where each layer’s output is the next layer’s input. However, many components in modern network architectures cannot be represented in this way. For instance, ResNet blocks (He et al. 2015), and any other block which requires a skip connection, are not compatible with sequential layers. The normal workaround for this in PyTorch is to write a custom forward function, effectively relying on the full flexibility of Python to escape the limits of composing these sequence layers.

However, there is a significant downside: the model is now no longer amenable to easy analysis and modification, such as removing the final few layers in order to do transfer learning. This also makes it harder to support automatic drawing graphs representing a model, printing a model summary, accurately reporting on model computation requirements by layer, and so forth.

Therefore, fastai attempts to provide the basic foundations to allow modern neural network architectures to be built by stacking a small number of predefined building blocks. The first piece of this system is the SequentialEx layer. This layer has the same basic API as PyTorch’s nn.Sequential, with one key difference: the original input value to the function is available to every layer in the block. Therefore, the user can, for instance, include a layer which adds the current value of the sequential block to the input value of the sequential block (such as is done in a ResNet).

To take full advantage of this capability, fastai also provides a MergeLayer class. This allows the user to pass any function, which will in turn be provided with the layer block input value, and the current value of the sequential block. For instance, if you pass in a simple add function, then MergeLayer provides the functionality of an identity connection in a standard resnet block. Or, if the user passes in a concatenation function, then it provides the basic functionality of a concatenating connection in a Densenet block (Huang, Liu, and Weinberger 2016). In this way, fastai provides primitives which allow representing modern network architecture out of predefined building blocks, without falling back to Python code in the forward function.

fastai also provides a general-purpose class for combining these layers into a wide range of modern convolutional neural network architectures. These are largely based on the underlying foundations from ResNet (He et al. 2015), and therefore this class is called XResNet. By providing parameters to this class, the user can customise it to create architectures that include squeeze and excitation blocks (Hu, Shen, and Sun 2017), grouped convolutions such as in ResNext (Xie et al. 2017), depth-wise convolutions such as in the Xception architecture (Chollet 2016), widening factors such as in Wide ResNets (Zagoruyko and Komodakis 2016), self-attention and symmetric self-attention functionality , custom activation functions, and more. By using this generic refactoring of these clusters of modern neural network architectures, we have been able to design and experiment with novel combinations very easily. It is also clearer to users exactly what is going on in their models, because the various specific architectures are clearly represented as changes to input parameters.

One set of techniques that is extremely useful in practice are the tweaks to the ResNet architecture described in (He et al. 2018). These approaches are used by default in XResNet. Another architecture tweak which has worked well in many situations is the recently developed Mish activation function (Misra 2019). fastai includes an implementation of Mish which is optimised using PyTorch’s just-in-time compiler (JIT).

A similar approach has been used to refactor the U-Net architecture (Ronneberger, Fischer, and Brox 2015). Through looking at a range of competition winning and state-of-the-art papers in segmentation, we curated a set of approaches that work well together in practice. These are made available by default in fastai’s U-Net implementation, which also dynamically creates the U-Net cross connections for any given input size.

Low-level APIs

The layered approach of the fastai library has a specific meaning at the lower levels of it stack. Rather than treating Python (Python Core Team 2019) itself as the base layer of the computation, which the middle layer relies on, those layers rely on a set of basic abstractions provided by the lower layer. The middle layer is programmed in that set of abstractions. The low-level of the fastai stack provides a set of abstractions for:

  • Pipelines of transforms: Partially reversible composed functions mapped and dispatched over elements of tuples
  • Type-dispatch based on the needs of data processing pipelines
  • Attaching semantics to tensor objects, and ensuring that these semantics are maintained throughout a Pipeline
  • GPU-optimized computer vision operations
  • Convenience functionality, such as a decorator to make patching existing objects easier, and a general collection class with a NumPy-like API.

The rest of this section will explain how the transform pipeline system is built on top of the foundations provided by PyTorch, type dispatch, and semantic tensors, providing the flexible infrastructure needed for the rest of fastai.

PyTorch foundations

The main foundation for fastai is the PyTorch (Paszke et al. 2017) library. PyTorch provides a GPU optimised tensor class, a library of useful model layers, classes for optimizing models, and a flexible programming model which integrates these elements. fastai uses building blocks from all parts of the PyTorch library, including directly patching its tensor class, entirely replacing its library of optimizers, providing simplified mechanisms for using its hooks, and so forth. In earlier prototypes of fastai we used TensorFlow (Abadi et al. 2015) as our platform (and before that used (Theano Development Team 2016)), but switched to PyTorch because we found that it had a fast core, a simple and well curated API, and rapidly growing popularity in the research community. At this point most papers at the top deep learning conferences are implemented using PyTorch.

fastai builds on many other open source libraries. For CPU image processing fastai uses and extends the Python imaging library (PIL) (Clark and Contributors, n.d.), for reading and processing tabular data it uses pandas, for most of its metrics it uses Scikit-Learn (Pedregosa et al. 2011), and for plotting it uses Matplotlib (Hunter 2007). These are the most widely used libraries in the Python open source data science community and provide the features necessary for the fastai library.

Transforms and Pipelines

One key motivation is the need to often be able to undo some subset of transformations that are applied to create the data used to modelling. This strings that represent categories cannot be used in models directly and are turned into integers using some vocabulary. And pixel values for images are generally normalized. Neither of these can be directly visualized, and therefore at inference time we need to apply the inverse of these functions to get data that is understandable. Therefore, fastai introduces a Transform class, which provides callable objects, along with a decode method. The decode method is designed to invert the function provided by a transform; it needs to be implemented manually by the user ; it is similar to the inverse_transform you can provide in Scikit-Learn (Pedregosa et al. 2011) pipelines and transformers. By providing both the encode and decode methods in a single place, the user ends up with a single object which they can compose into pipelines, serialize, and so forth.

Another motivation for this part of the API is the insight that PyTorch data loaders provide tuples, and PyTorch models expect tuples as inputs. Sometimes these tuples should behave in a connected and dependent way, such as in a segmentation model, where data augmentation must be applied to both the independent and dependent variables in the same basic way. Sometimes, however, different implementations must be used for different types; for instance, when doing affine transformations to a segmentation mask nearest-neighbor interpolation is needed, but for an image generally a smoother interpolation function would be used.

In addition, sometimes transforms need to be able to opt out of processing altogether, depending on context. For instance, except when doing test time augmentation, data augmentation methods should not be applied to the validation set. Therefore, fastai automatically passes the current subset index to transforms, allowing them to modify their behaviour based on subset (for instance, training versus validation). This is largely hidden from the user, because base classes are provided which automatically do this context-dependent skipping. However, advanced users requiring complete customization can use this functionality directly.

Transforms in deep learning pipelines often require state, which can be dependent on the input data. For example, normalization statistics could be based on a sample of data batches, a categorization transform could get its vocabulary directly from the dependent variable, or an NLP numericalization transform could get its vocabulary from the tokens used in the input corpus. Therefore, fastai transforms and pipelines support a setup method, which can be used to create this state when setting up a Pipeline. When pipelines are set up, all previous transforms in the pipeline are run first, so that the transform being set up receives the same structure of data that it will when being called.

This is closely connected to the implementation of TfmdList. Because a TfmdList lazily applies a pipeline to a collection, fastai can automatically call the Pipeline setup method as soon as it is connected to the collection in a TfmdList.

Type dispatch

The fastai type dispatch system is like the functools.singledispatch system provided in the Python standard library while supporting multiple dispatch over two parameters. Dispatch over two parameters is necessary for any situation where the user wants to be able to customize behavior based on both the input and target of a model. For instance, fastai uses this for the show_batch and show_results methods. As shown in the application section, these methods automatically provide an appropriate visualisation of the input, target, and results of a model, which requires responding to the types of both parameters. In one example the input was an image, and the target was a segmentation mask, and the show results method automatically used a colour-coded overlay for the mask. On the other hand, for an image classification problem, the input would be shown as an image, the prediction and target would be shown as text labels, and color-coded based on whether they were correct.

It also provides a more expressive and yet concise syntax for registering additional dispatched functions or methods, taking advantage of Python’s recently introduced type annotations syntax. Here is an example of creating two different methods which dispatch based on parameter types:

def f_td_test(x:numbers.Integral, y): return x+1
def f_td_test(x:int, y:float): return x+y

Here f_td_test has a generic implementation for x of numeric types and all ys, then a specialized implementation when x is an int and y is a float.

Object-oriented semantic tensors

By using fastai’s transform pipeline functionality, which depends heavily on types, the mid and high-level APIs can provide a lot of power, conciseness, and expressivity for users. However, this does not work well with the types provided by PyTorch, since the basic tensor type does not have any subclasses which can be used for type dispatch. Furthermore, subclassing PyTorch tensors is challenging, because the basic functionality for instantiating the subclasses is not provided and doing any kind of tensor operation will strip away the subclass information.

Therefore, fastai provides a new tensor base class, which can be easily instantiated and subclass. fastai also patches PyTorch’s tensor class to attempt to maintain subclass information through operations wherever possible. Unfortunately, it is not possible to always perfectly maintain this information throughout every possible operation, and therefore all fastai Transform automatically maintain subclass types appropriately.

fastai also provides the same functionality for Python imaging library classes, along with some basic type hierarchies for Python built-in collection types, NumPy arrays, and so forth.

GPU-accelerated augmentation

The fastai library provides most data augmentation in computer vision on the GPU at the batch level. Historically, the processing pipeline in computer vision has always been to open the images and apply data augmentation on the CPU, using a dedicated library such as PIL (Clark and Contributors, n.d.) or OpenCV (Bradski 2000), then batch the results before transferring them to the GPU and using them to train the model. On modern GPUs however, architectures like a standard ResNet-50 are often CPU-bound. Therefore fastai implements most common functions on the GPU, using PyTorch’s implementation of grid_sample (which does the interpolation from the coordinate map and the original image).

Most data augmentations are random affine transforms (rotation, zoom, translation, etc), functions on a coordinates map (perspective warping) or easy functions applied to the pixels (contrast or brightness changes), all of which can easily be parallelized and applied to a batch of images. In fastai, we combine all affine and coordinate transforms in one step to only apply one interpolation, which results in a smoother result. Most other vision libraries do not do this and lose a lot of detail of the original image when applying several transformations in a row.

A rotation and a zoom apply to an image with one interpolation only (right) or two interpolations (left). The latter results in more texture loss.

The type-dispatch system helps apply appropriate transforms to images, segmentation masks, key-points or bounding boxes (and users can add support for other types by writing their own functions).

Convenience functionality

fastai has a few more additions designed to make Python easier to use, including a NumPy-like API for lists called L, and some decorators to make delegation or patching easier.

Delegation is used when one function will call another and send it a bunch of keyword arguments with defaults. To avoid repeating those, they are often grouped into **kwargs. The problem is that they then disappear from the signature of the function that delegates, and you can’t use the tools from modern IDEs to get tab-completion for those delegated arguments or see them in its signature. To solve this, fastai provides a decorator called @delegates that will analyze the signature of the delegated function to change the signature of the original function. For instance the initialization of Learner has 11 keyword-arguments, so any function that creates a Learner uses this decorator to avoid mentioning them all. As an example, the function tabular_learner is defined like this:

def tabular_learner(dls, layers, emb_szs=None, config=None, **kwargs):

but when you look at its signature, you will see the 11 additional arguments of Learner.__init__ with their defaults.

Monkey-patching is an important functionality of the Python language when you want to add functionality to existing objects. fastai makes it easier and more concise with a @patch decorator, using Python’s type-annotation system. For instance, here is how fastai adds the read() method to the pathlib.Path class:

def write(self:Path, txt, encoding='utf8'):
    with self.open('w', encoding=encoding) as f: f.write(txt)

Lastly, inspired by the NumPy (Oliphant, n.d.) library, fastai provides a collection type, called L, that supports fancy indexing and has a lot of methods that allow users to write simple expressive code. For example, the code below takes a list of pairs, selects the second item of each pair, takes its absolute value, filters items greater than 4, and adds them up:

d = dict(a=1,b=-5,d=6,e=9).items()

L uses context-dependent functionality to simplify user code. For instance, the sorted method can take any of the following as a key: a callable (sorts based on the value of calling the key with the item), a string (used as an attribute name), or an int (used as an index).


In order to assist in developing this library, we built a programming environment called nbdev, which allows users to create complete Python packages, including tests and a rich documentation system, all in Jupyter Notebooks (Kluyver et al. 2016). nbdev is a system for exploratory programming. Exploratory programming is based on the observation that most developers spend most of their time as coders exploring and experimenting. Exploration is easiest developing on the prompt (or REPL), or using a notebook-oriented development system like Jupyter Notebooks. But these systems are not as strong for the “programming” part, since they’re missing features provided by IDEs and editors like good documentation lookup, good syntax highlighting, integration with unit tests, and (most importantly) the ability to produce final, distributable source code files.

nbdev is built on top of Jupyter Notebook and adds the following critically important tools for software development:

  • Python modules are automatically created, following best practices such as automatically defining __all__ with exported functions, classes, and variables
  • Navigate and edit code in a standard text editor or IDE, and export any changes automatically back into your notebooks
  • Automatically create searchable, hyperlinked documentation from your code (as seen in figure  [fig:nbdev]; any word surrounded in backticks will by hyperlinked to the appropriate documentation, a sidebar is created in the documentation site with links to each of module, and more
  • Pip installers (uploaded to pypi automatically)
  • Testing (defined directly in notebooks, and run in parallel)
  • Continuous integration
  • Version control conflict handling

We plan to provide more information about the features, benefits, and history behind nbdev in a future paper.

There has been a long history of high-level APIs for deep learning in Python, and this history has been a significant influence on the development of fastai. The first example of a Python library for deep learning that we have found is Calysto/conx, which implemented back propagation in Python in 2001. Since that time there have been dozens of approaches to high level APIs with perhaps the most significant, in chronological order, being Lasagne (Dieleman et al. 2015) (begun in 2013), Fuel/Blocks (begun in 2014), and Keras (Chollet and others 2015) (begun in 2015). There have been other directions as well, such as the configuration-based approach popularized by Caffe (Jia et al. 2014), and lower-level libraries such as Theano (Theano Development Team 2016), TensorFlow (Abadi et al. 2015) and PyTorch (Paszke et al. 2017).

APIs from general machine learning libraries have also been an important influence on fastai. SPSS and SAS provided many utilities for data processing and analysis since the early days of statistical computing. The development of the S language was a very significant advance, which led directly to projects such as R (R Core Team 2017), SPLUS, and xlisp-stat (Luke Tierney 1989). The direction taken by R for both data processing (largely focused on the “Tidyverse” (Wickham et al. 2019)) and model building (built on top of R’s rich “formula” system) shows how a very different set of design choices can result in a very different (and effective) user experience. Scikit-learn (Pedregosa et al. 2011), Torchvision (Massa and Chintala, n.d.), and pandas (McKinney 2010) are examples of libraries which provide a function composition abstraction that (like fastai’s Pipeline) are designed to help users process their data into the format they need (Scikit-Learn also being able to perform learning and predictions on that processed data). There are also projects such as MLxtend (Raschka 2018) that provide a variety of utilities building on the functionality of their underlying programming languages (Python, in the case of MLxtend).

The most important influence on fastai is, of course, PyTorch (Paszke et al. 2017); fastai would not have been possible without it. The PyTorch API is extensible and flexible, and the implementation is efficient. fastai makes heavy use of torch.Tensor and torch.nn (including torch.nn.functional). On the other hand, fastai does not make much use of PyTorch’s higher level APIs, such as nn.optim and annealing, instead independently creating overlapping functionality based on the design approaches and goals described above.

Results and conclusion

Early results from using fastai are very positive. We have used the fastai library to rewrite the entire fast.ai course “Practical Deep Learning for Coders”, which contains 14 hours of material, across seven modules, and covers all the applications described in this paper (and some more). We found that we were able to replicate or improve on all the results in previous versions of the material and were able to create the data pipelines and models needed much more quickly and easily than we could before. We have also heard from early adopters of pre-release versions of the library that they have been able to more quickly and easily write deep learning code and build models than with previous versions. fastai has already been selected as part of the official PyTorch Ecosystem3. According to the 2019 Kaggle ML & DS Survey4, 10% of data scientists in the Kaggle community are already using fastai. Many researchers are using fastai to support their work (e.g. (Revay and Teschke 2019) (Koné and Boulmane 2018) (Elkins, Freitas, and Sanz 2019) (Anand et al. 2019)).

Based on our experience with fastai, we believe that using a layered API in deep learning has very significant benefits for researchers, practitioners, and students. Researchers can see links across different areas more easily, rapidly combine and restructure ideas, and run experiments on top of strong baselines. Practitioners can quickly build prototypes, and then build on and optimize those prototypes by leveraging fastai’s PyTorch foundations, without rewriting code. Students can experiment with models and try out variations, without being overwhelmed by boilerplate code when first learning ideas.

The basic ideas expressed in fastai are not limited to use in PyTorch, or even Python. There is already a partial port of fastai to Swift, called SwiftAI (Jeremy Howard, Sylvain Gugger, and contributors 2019), and we hope to see similar projects for more languages and libraries in the future.


We would like to express our deep appreciation to Alexis Gallagher, who was instrumental throughout the paper-writing process, and who inspired the functional-style data blocks API. Many thanks also to Sebastian Raschka for commissioning this paper and acting as the editor for the special edition of Information that it appears in, to the Facebook PyTorch team for all their support throughout fastai's development, to the global fast.ai community who through forums.fast.ai have contributed many ideas and pull requests that have been invaluable to the development of fastai, to Chris Lattner and the Swift for TensorFlow team who helped develop the Swift courses at course.fast.ai and SwiftAI, to Andrew Shaw for contributing to early prototypes of showdoc in nbdev, to Stas Bekman for contributing to early prototypes of the git hooks in nbdev and to packaging and utilities, and to the developers of the Python programming language, which provides such a strong foundation for fastai's features.


Abadi, Martı́n, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.” https://www.tensorflow.org/.

Anand, Sarthak, Debanjan Mahata, Kartik Aggarwal, Laiba Mehnaz, Simra Shahid, Haimin Zhang, Yaman Kumar, Rajiv Ratn Shah, and Karan Uppal. 2019. “Suggestion Mining from Online Reviews Using Ulmfit.”

Bradski, G. 2000. “The OpenCV Library.” Dr. Dobb’s Journal of Software Tools.

Brébisson, Alexandre de, Étienne Simon, Alex Auvolat, Pascal Vincent, and Yoshua Bengio. 2015. “Artificial Neural Networks Applied to Taxi Destination Prediction.” CoRR abs/1508.00021. http://arxiv.org/abs/1508.00021.

Brostow, Gabriel J., Jamie Shotton, Julien Fauqueur, and Roberto Cipolla. 2008. “Segmentation and Recognition Using Structure from Motion Point Clouds.” In ECCV (1), 44–57.

Chollet, François. 2016. “Xception: Deep Learning with Depthwise Separable Convolutions.” CoRR abs/1610.02357. http://arxiv.org/abs/1610.02357.

Chollet, François, and others. 2015. “Keras.” https://keras.io.

Clark, Alex, and Contributors. n.d. “Python Imaging Library (Pillow Fork).” https://github.com/python-pillow/Pillow.

Cody A. Coleman, Daniel Kang, Deepak Narayanan, and Matei Zahari. 2017. “DAWNBench: An End-to-End Deep Learning Benchmark and Competition.” NIPS ML Systems Workshop, 2017.

Deng, J., W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. “ImageNet: A Large-Scale Hierarchical Image Database.” In CVPR09.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” CoRR abs/1810.04805. http://arxiv.org/abs/1810.04805.

Dieleman, Sander, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, et al. 2015. “Lasagne: First Release.” https://doi.org/10.5281/zenodo.27878.

Eisenschlos, Julian, Sebastian Ruder, Piotr Czapla, Marcin Kadras, Sylvain Gugger, and Jeremy Howard. 2019. “MultiFiT: Efficient Multi-Lingual Language Model Fine-Tuning.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/d19-1572.

Elkins, Andrew, Felipe F. Freitas, and Veronica Sanz. 2019. “Developing an App to Interpret Chest X-Rays to Support the Diagnosis of Respiratory Pathology with Artificial Intelligence.”

Girshick, Ross, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. 2018. “Detectron.” https://github.com/facebookresearch/detectron.

Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Nets.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 2672–80. Curran Associates, Inc. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf.

Guo, Cheng, and Felix Berkhahn. 2016. “Entity Embeddings of Categorical Variables.” CoRR abs/1604.06737. http://arxiv.org/abs/1604.06737.

Harper, F. Maxwell, and Joseph A. Konstan. 2015. “The Movielens Datasets: History and Context.” ACM Trans. Interact. Intell. Syst. 5 (4): 19:1–19:19. https://doi.org/10.1145/2827872.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” CoRR abs/1512.03385. http://arxiv.org/abs/1512.03385.

He, Tong, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2018. “Bag of Tricks for Image Classification with Convolutional Neural Networks.” CoRR abs/1812.01187. http://arxiv.org/abs/1812.01187.

Howard, Jeremy, and Sylvain Gugger. 2020. Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a Phd. 1st ed. O’Reilly Media, Inc.

Howard, Jeremy, and Sebastian Ruder. 2018. “Fine-Tuned Language Models for Text Classification.” CoRR abs/1801.06146. http://arxiv.org/abs/1801.06146.

Hu, Jie, Li Shen, and Gang Sun. 2017. “Squeeze-and-Excitation Networks.” CoRR abs/1709.01507. http://arxiv.org/abs/1709.01507.

Huang, Gao, Zhuang Liu, and Kilian Q. Weinberger. 2016. “Densely Connected Convolutional Networks.” CoRR abs/1608.06993. http://arxiv.org/abs/1608.06993.

Hunter, John D. 2007. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering 9 (3): 90.

Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” CoRR abs/1502.03167. http://arxiv.org/abs/1502.03167.

Jeremy Howard, Sylvain Gugger, and contributors. 2019. SwiftAI. fast.ai, Inc. https://github.com/fastai/swiftai.

Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. “Caffe: Convolutional Architecture for Fast Feature Embedding.” arXiv Preprint arXiv:1408.5093.

Kluyver, Thomas, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, et al. 2016. “Jupyter Notebooks – a Publishing Format for Reproducible Computational Workflows.” Edited by F. Loizides and B. Schmidt. IOS Press.

Koné, Ismaël, and Lahsen Boulmane. 2018. “Hierarchical Resnext Models for Breast Cancer Histology Image Classification.” CoRR abs/1810.09025. http://arxiv.org/abs/1810.09025.

Kudo, Taku. 2018. “Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates.” CoRR abs/1804.10959. http://arxiv.org/abs/1804.10959.

Kudo, Taku, and John Richardson. 2018. “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing.” CoRR abs/1808.06226. http://arxiv.org/abs/1808.06226.

LeCun, Yann, Corinna Cortes, and CJ Burges. 2010. “MNIST Handwritten Digit Database.” AT&T Labs [Online]. Available: Http://Yann. Lecun. Com/Exdb/Mnist 2: 18.

Lin, Tsung-Yi, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. “Microsoft COCO: Common Objects in Context.” CoRR abs/1405.0312. http://arxiv.org/abs/1405.0312.

Loshchilov, Ilya, and Frank Hutter. 2017. “Fixing Weight Decay Regularization in Adam.” CoRR abs/1711.05101. http://arxiv.org/abs/1711.05101.

Luke Tierney. 1989. “XLISP-STAT: A Statistical Environment Based on the XLISP Language (Version 2.0).” 28. School of Statistics, University of Minnesota.

Maas, Andrew L., Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. “Learning Word Vectors for Sentiment Analysis.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 142–50. Portland, Oregon, USA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1015.

Massa, Francisco, and Soumith Chintala. n.d. “Torchvision.” https://github.com/pytorch/vision/tree/master/torchvision.

McKinney, Wes. 2010. “Data Structures for Statistical Computing in Python.” In Proceedings of the 9th Python in Science Conference, edited by Stéfan van der Walt and Jarrod Millman, 51–56.

Merity, Stephen, Nitish Shirish Keskar, and Richard Socher. 2017. “Regularizing and Optimizing LSTM Language Models.” CoRR abs/1708.02182. http://arxiv.org/abs/1708.02182.

Micikevicius, Paulius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, et al. 2017. “Mixed Precision Training.” http://arxiv.org/abs/1710.03740.

Misra, Diganta. 2019. “Mish: A Self Regularized Non-Monotonic Neural Activation Function.” http://arxiv.org/abs/1908.08681.

Mnih, Andriy, and Ruslan R Salakhutdinov. 2008. “Probabilistic Matrix Factorization.” In Advances in Neural Information Processing Systems, 1257–64.

Oliphant, Travis. n.d. “NumPy: A Guide to NumPy.” USA: Trelgol Publishing. http://www.numpy.org/.

Ott, Myle, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. “Fairseq: A Fast, Extensible Toolkit for Sequence Modeling.” In Proceedings of Naacl-Hlt 2019: Demonstrations.

Parkhi, Omkar M., Andrea Vedaldi, Andrew Zisserman, and C. V. Jawahar. 2012. “Cats and Dogs.” In IEEE Conference on Computer Vision and Pattern Recognition.

Paszke, Adam, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. “Automatic Differentiation in PyTorch.” In NIPS Autodiff Workshop.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

Python Core Team. 2019. Python: A dynamic, open source programming language. Python Software Foundation. https://www.python.org/.

Radford, Alec, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.”

Raschka, Sebastian. 2018. “MLxtend: Providing Machine Learning and Data Science Utilities and Extensions to Python’s Scientific Computing Stack.” The Journal of Open Source Software 3 (24). https://doi.org/10.21105/joss.00638.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Revay, Shauna, and Matthew Teschke. 2019. “Multiclass Language Identification Using Deep Learning on Spectral Images of Audio Signals.”

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” CoRR abs/1505.04597. http://arxiv.org/abs/1505.04597.

Smith, Leslie N. 2015. “No More Pesky Learning Rate Guessing Games.” CoRR abs/1506.01186. http://arxiv.org/abs/1506.01186.

Smith, Leslie N. 2018. “A Disciplined Approach to Neural Network Hyper-Parameters: Part 1 - Learning Rate, Batch Size, Momentum, and Weight Decay.” CoRR abs/1803.09820. http://arxiv.org/abs/1803.09820.

Theano Development Team. 2016. “Theano: A Python framework for fast computation of mathematical expressions.” arXiv E-Prints abs/1605.02688 (May). http://arxiv.org/abs/1605.02688.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wolf, Thomas, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, et al. 2019. “HuggingFace’s Transformers: State-of-the-Art Natural Language Processing.” ArXiv abs/1910.03771.

Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. “Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation.” CoRR abs/1609.08144. http://arxiv.org/abs/1609.08144.

Xie, Saining, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. 2017. “Aggregated Residual Transformations for Deep Neural Networks.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July. https://doi.org/10.1109/cvpr.2017.634.

You, Yang, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, and Cho-Jui Hsieh. 2019. “Reducing BERT Pre-Training Time from 3 Days to 76 Minutes.” CoRR abs/1904.00962. http://arxiv.org/abs/1904.00962.

Zagoruyko, Sergey, and Nikos Komodakis. 2016. “Wide Residual Networks.” CoRR abs/1605.07146. http://arxiv.org/abs/1605.07146.

Zhang, Hongyi, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. 2017. “Mixup: Beyond Empirical Risk Minimization.” CoRR abs/1710.09412. http://arxiv.org/abs/1710.09412.


  1. https://github.com/pytorch/examples/blob/master/mnist/main.py ↩︎
  2. https://aws.amazon.com/opendata/public-datasets/ ↩︎
  3. https://pytorch.org/ecosystem/ ↩︎
  4. https://www.kaggle.com/c/kaggle-survey-2019 ↩︎