Our courses (all are free and have no ads):

Our software

New Opportunities For New Deep Learning Practitioners

Dawit Haile fought against the odds when he decided to study computer science in Eritrea, East Africa, despite having no internet connectivity. His perseverance paid off, first landing a job with the Eritrean government department of education, and later as an engineer in Lithuania. Today, Dawit is a data scientist in the San Francisco Bay Area, and he credits this new job to the knowledge and experience he gained from fast.ai. On the side, he’s building an algorithm to translate between English and his native language of Tigrinya.

Dawit is just one of many impressive fast.ai fellows who participated in our deep learning course this fall. It is entirely thanks to the sponsors who answered our call that these fellowships were possible: Amazon Web Services, Menlo Ventures, Domino Data Labs, Facebook, Natalia Baryshnikova, Lucien Carrol (with an employer match from Cisco), and the continued support of the University of San Francisco Data Institute.

We had over 70 incredibly qualified applicants for the diversity scholarships, including senior software engineers, several start-up founders, a researcher who had published in Nature, and many who are active in teaching, volunteer, and community organizations. It was a delight to be able to offer as many scholarships as we could with the support of our sponsors. Here are the stories of some of our fellows.

Adriana Fuentes is co-founder and technical lead at a stealth startup and president of the Society of Hispanic Professional Engineers at Silicon Valley (SHPE). She is applying knowledge gained from fast.ai to building a small autonomous vehicle which she will use to engage low socioeconomic students with the field of AI, as part of her volunteer work with SHPE. Previously, she built large scale distributed systems and databases at Hewlett Packard and was an engineer for hybrid vehicles, navigation systems, and infotainment at Ford Motor Company.

Sarada Lee was an accountant with no programming experience when she first encountered machine learning at a hackathon in 2016 and came away fascinated. She taught herself to code and founded the Perth Machine Learning Group in Perth, Australia. What began as a small group of friends meeting in Sarada’s living room grew to a community of 280 members within a year. The group worked through the online fast.ai course, won hackathons, attracted corporate sponsors, and hosts a number of speakers. Members have used image classification techniques on a utility project to potentially save millions of dollars. Sarada is now working on a new algorithm to read and understand large corpuses of documents, as well as developing new initiatives to help increase diversity in AI.

Tiffany Liu, a bioinformatics scientist researching brain tumor treatment, told us that the course provided hands-on help in her work building a multi-task neural network that simultaneously predicts both the tumor region and its associated clinical information.

Nahid Alam, founder of litehouse.io for voice-first user experience for home automation, is now using AI in her work as a senior software engineer at Cisco, mentoring with Backstage Capital, and is listed as one of the top women in AI to follow. Nahid told us “everyone talks about concepts, [but] resource for coders/engineers are rare.” Fast.ai is filling this gap. She wrote Automate the Boring Task: Chatbots in Enterprise Software about her work with chatbot frameworks, conversational AI products, and bot analytics products.

We’re in awe of Dawit, Adriana, Tiffany, Sarada, Nahid, and all our diversity fellows, and we are so grateful to our sponsors for making this possible. While many are bemoaning a supposed “talent shortage” in AI, it is encouraging to see these companies and individuals take concrete action.

Five Trends to Avoid When Founding a Startup

This post was inspired by a round-table discussion I led on the topics of founding start-ups and personal branding at the Women in Machine Learning Workshop, co-located with deep learning conference NIPS. I covered personal branding in a previous post.

This post has been translated into Korean here.

When I first moved to San Francisco in 2012, I was thrilled by how many startups there are here; the culture seemed so creative! Then I realized that most of the startups were indistinguishable from one another: nearly everyone was following the same destructive trends which are bad for employees and bad for products.

If you are working on a startup, I want you to know that there are options in how to to do things. After working at several startups and watching friends found start-ups, I took the leap and started fast.ai, together with Jeremy Howard. We are unusual in many ways: we have no interest in growing our tiny team; we are allergic to traditional venture capital; and we don’t plan to hire any deep learning PhDs. Yet we are still having a big impact!

fast.ai founders Jeremy and Rachel speaking at the Nikkei Innovation Forum
fast.ai founders Jeremy and Rachel speaking at the Nikkei Innovation Forum

If you are going to avoid making the same mistakes that so many entrepreneurs have made, the first step is to be able to recognize them. I’ve identified 5 dominant narratives in Bay Area Tech start-ups that not only harm employees, but lead to weaker companies and worse products. This post offers a high-level overview, and I’ll dig into the trends in greater detail in future posts (adding links as I do so):

  1. Venture Capital often pushes what could’ve been a successful small business to over-expand and ultimately fail; prevents companies from focusing their priorities; distracts from finding a monetization plan; causes conflict due to the misalignment of incentives between VCs and founders; and is full of far too many unethical bullies and thugs.
  2. Hypergrowth is nearly impossible to manage and leads to communication failures, redundant work, burnout, and high employee attrition.
  3. Trying to be “like a family” severely limits your pool of potential employees, leaves you unprepared for conflict or HR incidents, and sets employees up to feel betrayed.
  4. Attempting to productionize a PhD thesis is rarely a good business plan. The priorities and values of academia and business are drastically different.
  5. Hiring a bunch of academic researchers will not improve your product and harms your company by diverting so many resources (unless your goal is an aquihire).

I recognize that there are many startups following these trends that have high-valuations on paper. However, that does not mean that these companies will succeed in the long-term (we’ve already seen many highly valued, high profile startups fail in recent years).

Negative trend 1: Venture Capital

Imagine you were to create a business where you could profitably support yourself and 10 employees selling a product your customers liked, and after running it for 10 years you sold it for $10 million, of which half ended up in your pocket and half with your employees. Most VCs would consider that an abject failure. They are looking for at least 100x returns, because all of their profits come from the one or two best performers in their portfolio.

Therefore, VCs often push companies to grow too quickly, before they’ve nailed down product-market fit and monetization. Growing at a slow, sustainable rate helps keep your priorities in order. Funding yourself (through part-time consulting, saving up money in advance, and/or getting a simple product to market quickly) will force you to stay smaller and grow more slowly than VC funded businesses, but this is good. Staying small keeps you focused on a small number of high-impact features.

I have seen a lot of deeply unethical, bullying, and downright illegal behavior by venture capitalists against close friends of mine. This is not just a few bad actors: the behavior is wide-spread, including by many well-known and ultra-wealthy investors (although founders often don’t speak out about it because of fear of professional repercussions).

Negative trend 2: Hypergrowth

Hypergrowth typically involves: chaos, inefficiency, and severe burn-out (none of which is good for your business) I’ve worked at several companies that have doubled in size in just a year. It was always painful and chaotic. Communication broke down. There was duplicate and redundant work. Company politics became increasingly destructive. Burnout was endemic and many people quit. In all cases, the quality of the product suffered.

Management is hard, and management of hypergrowth is an order of magnitude harder. So many start-ups work their employees into the ground for the sake of short-term growth. Burnout is a very real and expensive problem in the tech industry, and hypergrowth routinely leads to burnout.

Negative trend 3: “Our startup is like a family”

Many startups claim that they’re creating a family-like culture amongst their employees: they don’t just work together, they go out after work, share the same hobbies, and are best friends. Doing this severely limits your pool of potential employees. Employees with health problems, long commutes, families, outside hobbies, outside friendships, or from under-represented groups may all struggle to thrive in such a culture.

Secondly, you are making a promise you can’t keep, which sets people up for feeling betrayed. You’re not actually a family; you are a company. You will need to make hard decisions for the sake of the business. You can’t actually offer people anything remotely close to lifelong loyalty or security, and it’s dishonest to implicitly do so.

Negative trend 4 (AI specific): Productionizing your PhD thesis

The best approach to starting a start-up is to address a problem that people in the business world have. Your PhD thesis is not doing this, and it is highly unlikely that it will give you a competitive edge. You and your adviser picked your thesis topic because it’s an interesting technical problem with good opportunities to publish, not because it has a large opportunity for impact in an underserved market with few barriers to entry.

In the business world, products are not evaluated on underlying theoretical novelty, but on implementation, ease-of-use, effectiveness, and how they relate to revenues.

Negative trend 5 (AI specific): Hiring a bunch of PhDs

You almost certainly do not need a bunch of PhDs. There are so many things that go into a successful product beyond the algorithm: the product-market fit, software engineering that productionizes and deploys it, the act of selling it, supporting your users, etc. And even for highly technical aspects like deep learning, fast.ai has shown that people with 1-year of coding experience can become world-class deep learning practitioners; you don’t need to hire Stanford PhDs. By diverting valuable resources into academic research at your startup, you are hurting the product.

My journey to fast.ai

Whilst avoiding these trends, fast.ai has accomplished far more than I ever expected in our first year and a half: over 100,000 people have started our Practical Deep Learning for Coders course and fast.ai students have landed new jobs, launched companies, had their work shown on HBO, been featured in Forbes, won hackathons, and been accepted to the Google Brain AI Residency. Fast.ai has been mentioned in the Harvard Business Review and the New York Times.

Fast.ai is solving a problem that I experienced first-hand: how hard it can be to break into deep learning and gain practical AI knowledge if you don’t have the “right” background and didn’t train with the academic stars of the field. I have seen and experienced some of the obstacles facing outsiders: inequality, discrimination, and lack of access.

I grew up in Texas (not in a major city) and attended a poor, predominantly Black public high school that was later ranked in the bottom 2% of Texas schools. We had far fewer resources and opportunities compared to the wealthier, predominantly White schools around us. In graduate school, the sexism and harassment I experienced led me to abandon my dreams of becoming a math professor, although I then experienced similar problems in the the tech industry. When I first became interested in deep learning in 2013, I found that experts weren’t writing down the practical methods they used to actually get their research to work, instead just publishing the theory. I believe deep learning will have a huge impact across all industries, and I want the creators of this technology to be a more diverse and less exclusive group.

With fast.ai, I’m finally able to do work completely in line with my values, on a tiny team characterized by trust and respect. Having a small team forces us to prioritize ruthlessly, and to focus only on what we value most or think will be highest impact. Something that has surprised me with fast.ai is how much I’ve been able to invest in my own career and own skills, in ways that I never could in previous jobs. Jeremy and I are committed to fast.ai for the long term, so neither of us has any interest in burning out. We believe you can have an impact with your work, without destroying your health and relationships.

I’d love to see more small companies building useful products in a healthy and sustainable way.

Deep Learning Diversity Fellowship Applications Now Open

Cutting Edge Deep Learning for Coders (Part 2) will be taught this spring at the USF Data Institute in downtown San Francisco, on Monday evenings from March 19 to April 30. This course builds on our updated Practical Deep Learning for Coders (Part 1). The preview version of Part 1 is available here and the final version will be released in the next week.

We are now accepting applications from women, people of Color, LGBTQ people, and veterans for diversity scholarships for the deep learning part 2 course. The prerequisites are:

  • Familiarity with Python, git, and bash
  • Familiarity with the content covered in Deep Learning Part 1, version 2, including the fastai library, a high-level wrapper for PyTorch (it’s OK to start studying this material now, as long as you complete it by the start of the course)
  • Available on Monday evenings to attend in person course in SOMA, from March 19 to April 30
  • Able to commit 10 hours a week of study to the course.

You can fulfill the requirement to be familiar with deep learning, the fastai library, and PyTorch by doing any 1 of the following:

  • You took the updated, in-person deep learning part 1 course during fall 2017
  • You have watched the first 2 videos of the online course before you apply, and a commitment to work through all 7 lessons before the start of the course. We estimate that each lesson takes approximately 10 hours of study (so you would need to study for the 7 weeks prior to the course starting on March 19, for 10 hours each week).
  • You have previously taken the older version of the course (released last year) AND watch the first 4 lessons of the new course to get familiar with the fastai library and PyTorch.

Preference will be given to students that have already completed some machine learning or deep learning education, including any fast.ai courses, the Coursera machine learning course, deeplearning.ai, or Stanford’s CS231n.

Deep Learning Part 1 covers the use of deep learning for image recognition, recommendation systems, sentiment analysis, and time-series prediction. Part 2 will take this further by teaching you how to read and implement cutting edge research papers, generative models and other advanced architectures, and more in-depth natural language processing. As with all fast.ai courses, it will be practical, state-of-the-art, and geared towards coders.

Increasing diversity in AI is a core part of our mission at fast.ai to make deep learning more accessible. We want to get deep learning into the hands of as many people as possible, from as many diverse backgrounds as possible. People with different backgrounds have different problems they’re interested in solving. We are horrified by unethical uses of AI, widespread bias, and how overwhelmingly white and male most deep learning teams are. Increasing diversity won’t solve the problem of ethics and bias alone, but it is a necessary step.

How to Apply

Women, people of Color, LGBTQ people, people with disabilities, and veterans in the Bay Area, if you have at least one year of coding experience, can fulfill the deep learning pre-requisite (described above), and can commit 8 hours a week to working on the course, we encourage you to apply for a diversity scholarship. The number of scholarships we are able to offer depends on how much funding we receive (if your organization may be able to sponsor one or more places, please let us know).

To apply for the fellowship, you will need to submit a resume and statement of purpose. The statement of purpose will include the following:

1 paragraph describing one or more problems you’d like to apply deep learning to 1 paragraph describing previous machine learning education (e.g. fast.ai courses, coursera, deeplearning.ai,…) Confirm that you fulfill the deep learning part 1 pre-requisite (or that you have already completed the first 2 lessons and plan to complete the rest before the course starts) Confirm that you are available to attend the course on Monday evenings in SOMA (for 7 weeks, beginning March 19), and that you can commit 8 hours a week to working on the course Which under-indexed group(s) you are a part of (gender, race, sexual identity, veteran)

Diversity Fellowship applications should be submitted here: https://gradapply.usfca.edu/register/di_certificates

If you have any questions, please email datainstitute@usfca.edu.

The deadline to apply is January 24, 2018.


I’m not eligible for the diversity scholarship, but I’m still interested. Can I take the course? Absolutely! You can register here.

I don’t live in the San Francisco Bay Area; can I participate remotely? Yes! Once again, we will be offering remote international fellowships. Stay tuned for details to be released in a blog post in the next few weeks.

Will this course be made available online later? Yes, this course will be made freely available online afterwards. Benefits of taking the in-person course include earlier access, community and in-person interaction, and more structure (for those that struggle with motivation when taking online courses).

Is fast.ai able to sponsor visas or provide stipends for living expenses? No, we are not able to sponsor visas nor to cover living expenses.

How will this course differ from the fast.ai Deep Learning part 2 course taught in spring 2017? Our goal at fast.ai is to push the state-of-the-art. Each year, we want to make deep learning increasingly intuitive to use while giving better results. With our fastai library, we are beating our own state-of-the-art results from last year. Also, last year’s course was taught primarily in TensorFlow, while this was in in PyTorch.

Making Peace with Personal Branding

As a child, I was nerdy and shy. At my elementary and middle schools, we had to present our science projects to judges in the school science fair each year, and I noticed that students who were outgoing and good at presenting were more likely to win. I remember feeling indignant– shouldn’t we just be judged on scientific merit? Why should things like being able to smile, make eye contact, and show enthusiasm (all things I didn’t do) with the judges have any impact?

But it turns out that those other skills are actually useful! Personal branding is similar– we may want our professional work to stand on its own merit, but how we present and share it is important. And so, two weeks ago I found myself mentoring on the cringe-inducing topic of personal branding at the Women in Machine Learning Workshop, co-located with the deep learning conference NIPS. Part of me felt embarrassed to be talking about something as seemingly shallow as personal branding, while just a few tables away deep learning star Yoshua Bengio mentored on the more serious topic of deep learning. However, I’ve worked hard to make peace with the concept and wanted to share what I’ve discovered.

What is “personal branding” and why is it useful?

Over the past two years, I’ve consistently put time in to twitter and blog posts. Here are a few ways this has been helpful to me:

  • Being invited to attend the TensorFlow Dev Summit
  • Being invited to keynote JupyterCon
  • Being interviewed and quoted in Wired (twice)
  • Being able to raise money for 18 AI diversity scholarships and $250,000 of AWS credits to give to fast.ai students
me ontstage at JupyterCon
me ontstage at JupyterCon

I talked to a grad student who was giving an oral presentation at NIPS, and she noted how a classmate of hers with a much larger twitter following got significantly more retweets and attendees for his talk. This struck her as unfair since a larger twitter following doesn’t equate with better research, but it also convinced her that building a personal brand would be useful.

I think of personal branding as anything that helps people find out about you and your work. This includes blogging, using twitter, and public speaking. Personal branding is a bit like a web: your blog post may lead to a job interview; you may get a speaking engagement from someone who follows you on twitter; and your conference talk may lead some of the audience members to read your blog or follow you on twitter, continuing the cycle.

Personal branding is no substitute for doing high-quality technical work; it’s just the means by which you can share this work with a broader audience.

Making peace with personal branding

Here are a few things that helped me get okay with the idea of personal branding:

  • Personal branding sounds icky if you think of it as a shallow popularity contest, or of trying to trick people to click on links they don’t really want to click on. However, I now think about it as wanting people to know about high-quality work that I’m proud of and care about.

  • Realizing that social skills and communication skills are things I could get better at with practice (I consider personal branding to be a subset of communication skills). I felt more resentful when as a kid I thought that people were either born outgoing or not, and that there wasn’t anything I could do, but as I started working on those skills and saw them improve, I was encouraged.

  • People have a bias towards thinking the most valuable skills are the ones we are already good at and have already put a lot of time into (whether that’s a particular academic subject, programming language, or sport). I still catch myself feeling a particular affinity towards other mathematicians (I know firsthand that getting a math PhD was hard!) But a lot of other things are hard and valuable too.

Learn by observation

I recommend finding people who are doing personal branding well, and observing what they do and what works. What is it about that conference talk that made it so good? Why do you enjoy following X on twitter? What keeps you returning to Y’s blog?


In defense of twitter

Twitter seems really weird at first (I was a twitter skeptic for years, not starting to actively use it until 2014), but it’s actually really useful. I’ve met new people through twitter. I know people who have gotten jobs through twitter. There are some really interesting conversations that I see on twitter that I don’t see elsewhere, such as this discussion about what it means to do “more rigorous” deep learning experiments, or here where several genomics researchers responded to my question about whether Google’s DeepVariant is overhyped.

Apart from the “personal branding” aspects, twitter helps me practice being more concise. It’s been good for my writing skills. I also use it as a way of bookmarking blog posts I like and highlights from talks and conferences I attend, so sometimes I refer back to it a reference.

A tweet about a talk by Sandya Sankarram that I really enjoyed.

Twitter for beginners

Your enjoyment of twitter will vary greatly depending on who you follow. It will take some experimenting to get this right. Feel free to unfollow people if you realize you’re not getting anything out of their tweets. Whenever I read an article I like or hear a talk I like, I always look up the author/speaker on twitter and see if I find their tweets interesting. If so, I follow them. Also, there are people for whom you may love their writing/talks/other work, but don’t really enjoy their tweets. You don’t have to follow them. Twitter is it’s own distinct medium, and being good at something else doesn’t necessarily translate. If you are particularly looking for deep learning tweets, you can check out Jeremy Howard’s likes, and follow some of the accounts shared there.

People use twitter in a variety of ways: as a social network, for political activism, for self-expression, and more. I use twitter primarily as a professional tool (I think of it as a more dynamic version of LinkedIn), so I try to keep most of my tweets related to data science. If your goal is personal branding or finding a job, I recommend keeping your tweets mostly focused on your field. Some people deal with this by having separate personal and professional twitter accounts (for instance, Data Science Renee does this).

Above: Sharing one of my own blog posts on Twitter.

Feel free to mute topics you don’t want to hear about (you can mute particular words), and mute people who bring you down. You are allowed to use twitter however you like, and you aren’t required to argue with anyone you don’t want to.

Twitter can be a low time commitment. You don’t need to check it every day. It’s fine to just tweet once a week. When I started, I primarily used it as a way to bookmark blog posts or articles I liked. Building up followers can be a long, slow process. Be patient.

Observe successful twitter accounts, of people who aren’t “famous” (famous people will have a ton of followers regardless of the quality of their tweets), to see what works. A few accounts you might want to check out for inspiration are: Mariya Yao, Julia Evans, Data Science Renee, and Stephanie Hurlburt. They each have built up over 20k followers, by providing thoughtful and interesting tweets, and generously promoting the work of others.

Speaking at Meetups or Conferences

Most people (including experts with tons of experience) are terrified and intimidated by public speaking, yet it is such a great way to share your work that it’s worth it.

Two years ago I decided I wanted to do more public speaking after not having done much for many years (my previous experience was primarily academic and from before I switched into the tech industry). I was nervous and also uncertain if I had anything of value to say. I started small, giving a 5 minute lightning talk at a PyLadies meetup to a particularly supportive audience, gradually working up through events with 50-100 people, to eventually presenting to 700 people at JupyterCon.

I prepare a ton for talks, since it both helps me feel less anxious and results in stronger talks. I prepare for short talks and small audiences, as well as big talks, because I want to be respectful of the audience. I think it’s particularly important to go through your timing to make sure that you’ll be able to cover what you plan (I’ve seen some talks get cut off before the speaker even reached their main point).

Nothing is more irritating to me as an audience member than having to sit through an infomercial. It’s important to offer useful information to your audience, and not just advertise your product or company. My goal with all my talks is to have some information that will be useful or thought-provoking, even if the listeners never take a fast.ai course.

For every talk I give, I ask if the venue will be able to do a video-recording (here are professional recordings of me speaking at an ML meetup at AWS and at PyBay). If not, I will often do my own recording. I use the software Camtasia to capture my screen and video, and have my own microphone that plugs into my computer via usb. For instance, this is how I created the below tutorial on Word Embeddings. About 80 people attended the live workshop and now 2,400 have watched the recording online! Getting or making recordings allows you to reach a broader audience, and it will make it easier for you to get future speaking engagements as you build up a portfolio of your past talks.

If my talk involves code, I try to create a demo on github (like this or this) that has enough documentation to stand alone as a tutorial or guide. Even if I don’t plan to cover all the set-up or background in my talk, I want to give people a resource that they can use later. You don’t need to create a recording or a demo to give a talk (particularly if it will stress you out), but it’s worth considering.

Public Speaking Resources

Technically Speaking was an excellent newsletter sharing links to blog posts and videos with public speaking advice for those in tech, created by Cate Huston and Chiu-Ki Chan, senior developers with a ton of speaking experience. Although it is no longer active, you can still check out the archives here.

If you are a woman or non-binary person living in Atlanta, NYC, SF, Chicago, or LA, I highly recommend Write Speak Code meetups or workshops as a great place to practice technical talks and receive constructive feedback.

I was scheduled to speak to an audience of 1,000 people at TEDx San Francisco in October (unfortunately, I ended up in the ICU with a life-threatening illness at the last minute and couldn’t attend, but I’d already gone through months of preparation and was completely ready). I was terrified, so I started working with a public speaking coach in preparation, and it was super helpful. I searched for coaches on yelp, and met with a few to find one that I particularly liked. From asking around, a lot of excellent and famous speakers have worked with speech coaches. They can help with anything– from your voice and body language, to crafting engaging intros and conclusions. In hindsight, I probably should’ve met with a speech coach even earlier in my public speaking journey; you certainly don’t need to be preparing for an audience of 1,000 to hire one.

Many years ago I participated in a chapter of Toastmasters, and I enjoyed that. When I asked about speech coaches on twitter, several people told me that training in improv, theater, or singing had been helpful to them in the realm of public speaking.


I’ve already written an entire post on blogging. Here are a few highlights:

  • It’s like a resume, only better. I know of a few people who have had blog posts lead to job offers!
  • Helps you learn. Organizing knowledge always helps me synthesize my own ideas. One of the tests of whether you understand something is whether you can explain it to someone else. A blog post is a great way to do that.
  • I’ve gotten invitations to conferences and invitations to speak from my blog posts. I was invited to the TensorFlow Dev Summit (which was awesome!) for writing a blog post about how I don’t like TensorFlow.
  • Meet new people. I’ve met several people who have responded to blog posts I wrote.
  • Saves time. Any time you answer a question multiple times through email, you should turn it into a blog post, which makes it easier for you to share the next time someone asks.

It can be intimidating to start blogging, but remember that your target audience is you-6-months-ago, not Geoffrey Hinton. What would have been most helpful to your slightly younger self? You are best positioned to help people one step behind you. The material is still fresh in your mind. Many experts have forgotten what it was like to be a beginner (or an intermediate). The context of your particular background, your particular style, and your knowledge level will give a different twist to what you’re writing about.

And as inspiration, here are links to a few blogs that I consistently enjoy:

Go Forth and Personal Brand

Sharing high quality work (both your own and that of others) will help you develop a platform to further your goals. Observe people who are doing this well: who are writing blog posts you enjoy, producing tweets you like to follow, or giving engaging talks. While we may want technical expertise to stand on its own, communication skills are vital in helping your work reach an audience. Starting to work on any new skillset is often intimidating, but with small steps and practice you will improve. As a meta-exercise, I did some personal branding in this post by linking to my own work. It felt uncomfortable, but I followed my own advice and did it anyway!

What you need to do deep learning

This post has been translated into Chinese here.

I want to answer some questions that I’m commonly asked: What kind of computer do I need to do deep learning? Why does fast.ai recommend Nvidia GPUs? What deep learning library do you recommend for beginners? How do you put deep learning into production? I think these questions all fall under a general theme of What do you need (in terms of hardware, software, background, and data) to do deep learning? This post is geared towards those new to the field and curious about getting started.

The hardware you need

We are indebted to the gaming industry

The video game industry is larger (in terms of revenue) than the film and music industries combined. In the last 20 years, the video gaming industry drove forward huge advances in GPUs (graphical processing units), used to do the matrix math needed for rendering graphics. Fortunately, these are exactly the type of computations needed for deep learning. These advances in GPU technology are a key part of why neural networks are proving so much more powerful now than they did a few decades ago. Training a deep learning model without a GPU would be painfully slow in most cases.

Not all GPUs are the same

Most deep learning practitioners are not programming GPUs directly; we are using software libraries (such as PyTorch or TensorFlow) that handle this. However, to effectively use these libraries, you need access to the right type of GPU. In almost all cases, this means having access to a GPU from the company Nvidia.

CUDA and OpenCL are the two main ways for programming GPUs. CUDA is by far the most developed, has the most extensive ecosystem, and is the most robustly supported by deep learning libraries. CUDA is a proprietary language created by Nvidia, so it can’t be used by GPUs from other companies. When fast.ai recommends Nvidia GPUs, it is not out of any special affinity or loyalty to Nvidia on our part, but that this is by far the best option for deep learning.

Nvidia dominates the market for GPUs, with the next closest competitor being the company AMD. This summer, AMD announced the release of a platform called ROCm to provide more support for deep learning. The status of ROCm for major deep learning libraries such as PyTorch, TensorFlow, MxNet, and CNTK is still under development. While I would love to see an open source alternative succeed, I have to admit that I find the documentation for ROCm hard to understand. I just read the Overview, Getting Started, and Deep Learning pages of the ROCm website and still can’t explain what ROCm is in my own words, although I want to include it here for completeness. (I admittedly don’t have a background in hardware, but I think that data scientists like me should be part of the intended audience for this project.)

If you don’t have a GPU…

If your computer doesn’t have a GPU or has a non-Nvidia GPU, you have several great options:

  • Use Crestle, through your browser: Crestle is a service (developed by fast.ai student Anurag Goel) that gives you an already set up cloud service with all the popular scientific and deep learning frameworks already pre-installed and configured to run on a GPU in the cloud. It is easily accessed through your browser. New users get 10 hours and 1 GB of storage for free. After this, GPU usage is 59 cents per hour. I recommend this option to those who are new to AWS or new to using the console.

  • Set up an AWS cloud instance through your console: You can create an AWS instance (which remotely provides you with Nvidia GPUs) by following the steps in this fast.ai setup lesson. AWS charges 90 cents per hour for this. Although our set-up materials are about AWS (and you’ll find the most forum support for AWS), one fast.ai student created a guide for Setting up an Azure Virtual Machine for Deep learning. And I’m happy to share and add a link if anyone writes a blog post about doing this with Google Cloud Engine.

  • Build your own box. Here’s a lengthy thread from our fast.ai forums where people ask questions, share what components they are using, and post other useful links and tips. The cheapest new Nvidia GPUs are around $300, with some students finding even cheaper used ones on eBay or Craigslist, and others paying more for more powerful GPUs. A few of our students wrote blog posts documenting how they built their machines:

The software you need

Deep learning is a relatively young field, and the libraries and tools are changing quickly. For instance, Theano, which we chose to use for part 1 of our course in 2016, was just retired. PyTorch, which we are using currently, was only released earlier this year (2017). As Jeremy wrote previously, you should assume that whatever specific libraries and software you learn today will be obsolete in a year or two. The most important thing is to understand the underlying concepts, and towards that end, we are creating our own library on top of Pytorch that we believe makes deep learning concepts clearer, as well as encoding best practices as defaults.

Python is by far the most commonly used language for deep learning. There are a number of deep learning libraries available, with almost every major tech company backing a different library, although employees at those companies often use a mix of tools. Deep learning libraries include TensorFlow (Google), PyTorch (Facebook), MxNet (University of Washington, adapted by Amazon), CNTK (Microsoft), DeepLearning4j (Skymind), Caffe2 (also Facebook), Nnabla (Sony), PaddlePaddle (Baidu), and Keras (a high-level API that runs on top of several libraries in this list). All of these have Python options available.

Dynamic vs. Static Graph Computation

At fast.ai, we prioritize the speed at which programmers can experiment and iterate (through easier debugging and more intutive design) as more important than theoretical performance speed-ups. This is the reason we use PyTorch, a flexible deep learning library with dynamic computation.

One distinction amongst deep learning libraries is whether they use dynamic or static computations (some libraries, such as MxNet and now TensorFlow, allow for both). Dynamic computation mean that the program is executed in the order you wrote it. This typically makes debugging easier, and makes it more straightforward to translate ideas from your head into code. Static computation means that you build a structure for your neural network in advance, and then execute operations on it. Theoretically, this allows the compiler to do greater optimizations, although it also means there may be more of a disconnect between what you intended your program to be and what the compiler executes. It also means that bugs can seem more removed from the code that caused them (for instance, if there is an error in how you constructed your graph, you may not realize until you perform an operation on it later). Even though there are theoretical arguments that languages with static computation graphs are capable of better performance than languages with dynamic computation, we often find that is not the case for us in practice.

Google’s TensorFlow mostly uses a static computation graph, whereas Facebook’s PyTorch uses dynamic computation. (Note: TensorFlow announced a dynamic computation option, Eager Execution, just two weeks ago, although it is still quite early and most TensorFlow documentation and projects use the static option). In September, fast.ai announced that we had chosen PyTorch over TensorFlow to use in our course this year and to use for the development of our own library (a higher-level wrapper for PyTorch that encodes best practices). Briefly, here are a few of our reasons for choosing PyTorch (explained in much greater detail here):

  • easier to debug
  • dynamic computation is much better suited for natural language processing
  • traditional Object Oriented Programming style (which feels more natural to us)
  • TensorFlow’s use of unusual conventions like scope and sessions can be confusing and are more to learn

Google has put far more resources into marketing TensorFlow than anyone else, and I think this is one of the reasons that TensorFlow is so well known (for many people outside deep learning, TensorFlow is the only DL framework that they’ve heard of). As mentioned above, TensorFlow released a dynamic computation option a few weeks ago, which addresses some of the above issues. Many people have asked fast.ai if we are going to switch back to TensorFlow. The dynamic option is still quite new and far less developed, so we will happily continue with PyTorch for now. However, the TensorFlow team has been very receptive to our ideas, and we would love to see our fastai library ported to TensorFlow.

Note: The in-person version of our updated course, which uses PyTorch as well as our own fastai library, is happening currently. It will be released online for free after the course ends (estimated release: January).

What you need for production: not a GPU

Many people overcomplicate the idea of using deep learning in production and believe that they need much more complex systems than they actually do. You can use deep learning in production with a CPU and the webserver of your choice, and in fact, this is what we recommend for most use cases. Here are a few key points:

  • It is incredibly rare to need to train in production. Even if you want to update your model weights daily, you don’t need to train in production. Good news! This means that you are just doing inference (a forward pass through your model) in production, which is much quicker and easier than training.
  • You can use whatever webserver you like (e.g. Flask) and set up inference as a simple API call.
  • GPUs only provide a speed-up if you are effectively able to batch your data. Even if you are getting 32 requests per second, using a GPU would most likely slow you down, because you’d have to wait a second from when the 1st arrived to collect all 32, then perform the computation, and then return the results. We recommend using a CPU in production, and you can always add more CPUs (easier than using multiple GPUs) as needed.

For big companies, it may make sense to use GPUs in production for serving– however, it will be clear when your reach this size. Prematurely trying to scale before it’s needed will only add needless complexity and slow you down.

The background you need: 1 year of coding

One of the frustrations that inspired Jeremy and I to create Practical Deep Learning for Coders was (is) that most deep learning materials fall into one of two categories:

  • so shallow and high-level as to not give you the information or skills needed to actually use deep learning in the workplace or create state-of-the-art models. This is fine if you just want a high-level overview, but disappointing if you want to become a working practitioner.
  • highly theoretical and assume a graduate level math background. This is a prohibitive barrier for many folks, and even as someone who has a math PhD, I found that the theory wasn’t particularly useful in learning how to code practical solutions. It’s not surprising that many materials have this slant. Until quite recently, deep learning was almost entirely an academic discipline and largely driven by questions of what would publish in top academic journals.

Our free course Practical Deep Learning for Coders is unique in that the only pre-requisite is 1 year of programming experience, yet it still teaches you how to create state-of-the-art models. Your background can be in any language, although you might want to learn some Python before starting the course, since that is what we use. We introduce math concepts as needed, and we don’t recommend that you try to front-load studying math theory in advance.

If you don’t know how to code, I highly recommend learning, and Python is a great language to start with if you are interested in data science.

The data you need: far less than you think

Although many have claimed that you need Google-size data sets to do deep learning, this is false. The power of transfer learning (combined with techniques like data augmentation) make it possible for people to apply pre-trained models to much smaller datasets. As we’ve talked about elsewhere, at medical start-up Enlitic, Jeremy Howard led a team that used just 1,000 examples of lung CT scans with cancer to build an algorithm that was more accurate at diagnosing lung cancer than a panel of 4 expert radiologists. The C++ library Dlib has an example in which a face detector is accurately trained using only 4 images, containing just 18 faces!

Face Recognition with Dlib
Face Recognition with Dlib

A note about access

For the vast majority of people I talk with, the barriers to entry for deep learning are far lower than they expected and the costs are well within their budgets. However, I realize this is not the case universally. I’m periodically contacted by students that want to take our online course but can’t afford the costs of AWS. Unfortunately, I don’t have a solution. There are other barriers as well. Bruno Sánchez-A Nuño has written about the challenges of doing data science in places that don’t have reliable internet access, and fast.ai international fellow Tahsin Mayeesha describes hidden barriers to MOOC access in countries such as Bangladesh. I care about these issues of access, and it is disatisifying to not have solutions.