About the Practical Deep Learning for Coders course:

Practical Deep Learning Part 2 - Integrating Recent Advances and Classic Machine Learning

With part 2 of our in person SF course starting in 6 weeks, and applications having just opened, we figured we better tell you a bit about what to expect!… So here’s an overview of what we’re planning to cover.

Overall approach

The main theme of this part of the course will be tackling more complex problems, that require integrating a number of techniques. This includes both integrating multiple deep learning techniques (such as combining RNNs and CNNs for attentional models), as well as combining classic machine learning techniques with deep learning (such as using clustering and nearest neighbors for semi-supervised and zero-shot learning). As always, we’ll be introducing all methods in the context of solving end-to-end real world modeling problems, using Kaggle datasets where possible (so that we have a clear best-practice goal to aim for).

Since we have no pre-requisites for the course other than a year of coding experience and completion of part 1 of the course, we’ll be fully explaining all the classic ML techniques we’ll use as well.

In addition, we’ll be covering some more sophisticated extensions of the DL methods we’ve seen, such as adding memory to RNNs (e.g. for building question answering systems / “chat bots”), and multi-object segmentation and detection methods.

Some of the methods we’ll examine will be very recent research directions, including unpublished research we’ve done at fast.ai. So we’ll be looking at journal articles much more frequently in this part of the course—a key teaching goal for us is that you come away from the course feeling much more comfortable reading, understanding, and implementing research papers. We’ll be sharing some simple tricks that make it much easier to quickly scan and get the key insights from a paper.

Python 3 and Tensorflow

This part of the course will use Python 3 and Tensorflow, instead of Python 2 and Theano as used in part 1. We’ll explain our reasoning in more detail in a future post; we hope that you will come away from the course feeling confident in both of these tools, and able to identify the strengths and weaknesses of both, to help you decide what to use in your own projects.

We’ve found using Python 3 to develop the course materials quite a bit more pleasant than Python 2. Whilst version 3 of the language has provided some incremental improvements for many years, until recently we’ve found the lack of support for Python 3 in scientific computing libraries resulted in it being a very frustrating experience. The good news is that that’s all changed now, and furthermore recent developments in Python 3.4 and 3.5 have greatly improved the productivity of the language.

Our view of Tensorflow is that buried in a rather verbose and complex API there’s a very nice piece of software buried away in there. We’ll be showing how to write custom GPU accelerated algorithms from scratch in Tensorflow, staying within a small and simple subset of the Tensorflow API where things stay simple and elegant.

Structured data, time series analysis, and clustering

One area where deep learning has been almost entirely ignored is in the area of structured data analysis (i.e. analyzing data where each column represents a distinct feature, such as from a database table). We had wondered whether this is because deep learning is simply less well suited to this task than the very popular decision tree ensembles (such as random forests and XGBoost, which we’re big fans of), but we’ve recently done some research that has shown that deep learning can be both simpler and more effective than these techniques. But getting it to work well requires getting a lot of little details right—details that have never been fully understood or documented elsewhere to the best of our knowledge.

We’ll be showing how to get state of the art results in structured data analysis, including showing how to use the wonderful XGBoost, and comparing these techniques. We’ll also take a brief detour into looking at R, where structured data analysis is still quite a bit more straightforward than Python.

Most of the structured data sets we’ll investigate will have a significant time series component, so we’ll also be discussing the best ways to deal with this kind of data. Time series pop up everywhere, such as fraud and credit models (using time series of transactions), maintenance and operations (using time series of sensor readings), finance (technical indicators), medicine (medical sensors and EMR data), and so forth.

We will also begin our investigation of cluster analysis, showing how it can be combined with a softmax layer to create more accurate models. We will show how to implement this analysis from scratch in Tensorflow, creating a novel GPU accelerated algorithm.

Deep dive into computer vision

We will continue our investigation into computer vision applications from part 1, getting into some new techniques and new problem areas. We’ll study resnet and inception architectures in more detail, with a focus on how these architectures can be used for transfer learning. We’ll also look at more data augmentation techniques, such as test time augmentation, and occlusion.

We’ll learn about the K nearest neighbors algorithm, and use it in conjunction with CNNs to get state of the art results on multi-frame image sequence analysis (such as videos or photo sequences). From there, we will look at other ways of grouping objects using deep learning, such as siamese and triplet networks, which we will use to get state of the art results for image comparisons.

Unsupervised and semi-supervised learning, and productionizing models

In part 1 we studied pseudo-labeling and knowledge distillation for semi-supervised learning. In part 2 we’ll learn more techniques, including bayesian-inspired techniques such as variational autoencoder and variational ladder networks. We will also look at the role of generative models in semi-supervised learning.

We will show how to use unsupervised learning to build a useful photo fixing tool, which we’ll then turn into a simple web app in order to show how you can put deep learning models into production.

Zero-shot learning will be a particular focus, especially the recently developed problem of generalized zero-shot learning. Solving this problem allows us to build models on a subset of the full dataset, and apply those models to whole new classes that we haven’t seen before. This is important for real-world applications, where things can change and new types of data can appear any time, and where labeling can be expensive, slow, and/or hard to come by.

And don’t worry, we haven’t forgotten NLP! NLP is a great area to apply unsupervised and semi-supervised learning, and we will look at a number of interesting problems and techniques in this space, including how to use siamese and triplet networks for text analysis.

Segmentation, detection, and handling large datasets

Handling large datasets requires careful management of resources, and doing it in a reasonable time frame requires being thoughtful about the full modeling process. We will show how to build models on the well-known Imagenet dataset, and will show that analysing such a large dataset can readily be done on a single machine fairly quickly. We will discuss how to use your GPU, CPUs, RAM, SSD, and HDD together, taking advantage of each part most effectively.

Whereas most of our focus on computer vision so far has been classification, we’ll now move our focus to localization—that is, finding the objects in an image (or in NLP, finding the relevant parts of a document). We have looked at some simple heatmap and bounding box approaches in part 1 already; in part 2 we build on that to look at more complete segmentation systems, and methods for finding multiple objects in an image. We will look at the results of the recent COCO competition to understand the best approaches to these problems.

Neural machine translation

As recently covered by the New York Times, Google has totally revamped their Translate tool using deep learning. We will learn about what’s behind this system, and similar state of the art systems—including some more recent advances that haven’t yet found their way into Google’s tool.

We’ll start with looking at the original encoder-decoder model that neural machine translation is based on, and will discuss the various potential applications of this kind of sequence to sequence algorithm. We’ll then look at attentional models, including applications in computer vision (where they are useful for large and complex images). In addition, we will investigate stacking layers, both in the form of bidirectional layers, and deep RNN architectures.

Question answering and multi-modal models

Recently there has been a lot of hype about chatbots. Although in our opinion they’re not quite ready for prime time (which is why pretty much all production chatbots still have a large human element), it’s instructive to see how they’re built. In general, question answering systems are built using architectures that have an explicit memory; we will look at ways of representing that in a neural network, and see the impact it has on learning.

We will also look at building visual Q&A systems, where you allow the user to ask questions about an image. This will build on top of the work we did earlier on zero-shot learning.

Reinforcement learning

Reinforcement learning has become very popular recently, with Google showing promising results in training robots to complete complex grasping actions, and DeepMind showing impressive results in playing computer games. We will survey the reinforcement learning field and attempt to identify the most promising application areas, including looking beyond the main academic areas of study (robots and games) to opportunities for reinforcement learning of more general use.

We hope to see you at the course! Part 1 was full, and part 2 is likely to be even more popular, so get your application in soon!

Big deep learning news: Google Tensorflow chooses Keras

Buried in a Reddit comment, Francois Chollet, author of Keras and AI researcher at Google, made an exciting announcement: Keras will be the first high-level library added to core TensorFlow at Google, which will effectively make it TensorFlow’s default API. This is excellent news for a number of reasons!

As background, Keras is a high-level Python neural networks library that runs on top of either TensorFlow or Theano. There are other high level Python neural networks libraries that can be used on top of TensorFlow, such as TF-Slim, although these are less developed and not part of core TensorFlow.

Using TensorFlow makes me feel like I’m not smart enough to use TensorFlow; whereas using Keras makes me feel like neural networks are easier than I realized. This is because TensorFlow’s API is verbose and confusing, and because Keras has the most thoughtfully designed, expressive API I’ve ever experienced. I was too embarrassed to publicly criticize TensorFlow after my first few frustrating interactions with it. It felt so clunky and unnatural, but surely this was my failing. However, Keras and Theano confirm my suspicions that tensors and neural networks don’t have to be so painful. (In addition, in part 2 of our deep learning course Jeremy will be showing some tricks to make it easier to write custom code in Tensorflow.)

For a college assignment, I once used a hardware description language to code division by adding and shifting bits in the CPU’s registers. It was an interesting exercise, but I certainly wouldn’t want to code a neural network this way. There are a number of advantages to using a higher level language: quicker coding, fewer bugs, and less pain. The benefits of Keras go beyond this: it is so well-suited to the concepts of neural networks, that Keras has improved how Jeremy and I think about neural networks and facilitated new discoveries. Keras makes me better at neural networks, because the language abstractions match up so well with neural network concepts.

Writing programs in the same conceptual language that I’m thinking in allows me to focus my attention on the problems I’m trying to solve, and not on artifacts of the programming language. When most of my mental energy is spent converting between the abstractions in my head and the abstractions of the language, my thinking becomes slower and fuzzier. TensorFlow effects my productivity in a similar way that having to code in Assembly would effect my productivity.

As Chollet wrote, “If you want a high-level object-oriented TF API to use for the long term, Keras is the way to go.” And I am thrilled about this news.

Note: For our Practical Deep Learning for Coders course, we used Keras and Theano. For Practical Deep Learning for Coders Part 2, we plan to use Keras and TensorFlow. We prefer Theano over TensorFlow, because Theano is more elegant and doesn’t make scope super annoying. Unfortunately, only TensorFlow supports some of the things we want to teach in part 2.

UPDATE: I drafted this post last week. After publishing, I saw on Twitter that Francois Chollet had announced the integration of Keras into TensorFlow a few hours earlier.

Where is AI/ML actually adding value at your company?

An interesting thread came up over at Hacker News: Ask HN: Where is AI/ML actually adding value at your company?. And the folks at High Scalability were good enough to summarize the answers. It was somewhat buried in a lengthy blog post, so we wanted to highlight it here. So without further ado, here is the list:

  • Predicting if a part scanned with an acoustic microscope has internal defects
  • Find duplicate entries in a large, unclean data set
  • Product recommendations
  • Course recommendations
  • Topic detection
  • Pattern clustering
  • Understand the 3D spaces scanned by customers
  • Dynamic selection of throttle threshold
  • EEG interpretation
  • Predict which end users are likely to churn for our customers
  • Automatic data extraction from web pages
  • Model complex interactions in electrical grids in order to make decisions that improve grid efficiency
  • Sentiment classification
  • Detecting fraud
  • Credit risk modeling
  • Spend prediction
  • Loss prediction
  • Fraud and AML detection
  • Intrusion detection
  • Email routing
  • Bandit testing
  • Optimizing planning/ task scheduling
  • Customer segmentation
  • Face- and document detection
  • Search/analytics
  • Chat bots
  • Topic analysis
  • Churn detection
  • Phenotype adjudication in electronic health records
  • Asset replacement modeling
  • Lead scoring
  • Semantic segmentation to identify objects in the users environment to build better recommendation systems and to identify planes (floor, wall, ceiling) to give us better localization of the camera pose for height estimates
  • Classify bittorrent filenames into media classify bittorrent filenames into media categories
  • Predict how effective a given CRISPR target site will be
  • Check volume, average ticket $, credit score and things of that nature to determine the quality and lifetime of a new merchant account
  • Anomaly detection
  • Identify available space in kit from images
  • Optimize email marketing campaigns
  • Investigate & correlate events, initially for security logs
  • Moderate comments
  • Building models of human behavior to provide interactive intelligent agents with a conversational interface
  • Automatically grading kids’ essays
  • Predict probability of car accidents based on the sensors of your smartphone
  • Predict how long JIRA tickets are going to take to resolve
  • Voice keyword recognition
  • Produce digital documents in legal proceedings
  • PCB autorouting

The Deep Learning MOOC is now available!

We’re very excited and proud to announce the launch of the fast.ai Deep Learning MOOC. It contains all the lessons from the in-person course we’ve been discussing here over the last few months, along with extra online material to help students understand the content and complete the assigments. All MOOC participants are invited to participate in the fast.ai deep learning community, including through the forums and the wiki.

For me, the most gratifying part of putting the course online was going through all the wonderful testimonials we’ve received from our students. Thank you all for you inspiring words!

(Update - problem resolved!) Azure and AWS's 'GPU general availability' lies


Huge thanks to Boyd Mcgeachie from AWS for reaching out to us and organizing a (nearly) frictionless AWS onboarding experience for our MOOC participants. He couldn’t have been more gracious in accepting the criticisms and concerns laid out below, and explained that AWS is aware of them and working hard to fix them for all customers. I’m thrilled that we have a solution to this that allows our students to use AWS, since it’s a great service and we invested a lot of time in automating and simplifying the management of AWS instances.

Original post:

Both Microsoft and AWS have, with great fanfare, recently announced the general availability of their deep learning capable GPU instances. Unfortunately, they are far less “available” than they claim, and they have not even bothered to tell their own support teams about these limitations, let alone telling their potential customers.

The problem is that for both companies, the so-called “available” GPUs can not actually be purchased by new users. This is not mentioned anywhere, and in the case of AWS they let you go through the entire onboarding process before giving a totally obscure error (“You have requested more instances (1) than your current instance limit of 0 allows for the specified instance type”). Azure at least are a little better (they grey out the GPU instance types and write “not available” over the top of them).

We have a major deep learning MOOC launching tomorrow, and we think it may be pretty popular (it’s the first course that shows how to create state of the art models using a code-centric approach). Many students will be learning how to use cloud-based machines for the first time. But, as it stands, there is nowhere they can pay for the privilege of renting a GPU-based machine, unless they have an existing established account with Azure or AWS. Trying to resolve this with Azure and AWS has been a rather bemusing experience, as I have to repeat myself again and again to explain this limitation to support staff who have not been briefed on it. I’ve had to explain that no, it’s not user error (our 100 students of the in-person course that the MOOC is based on are not likely to have all made the exact same error!), and yes we are using the correct region, and no we’re not trying to use spot instances, etc, etc, etc…

To be clear, I understand that for capacity planning reasons it may be necessary to limit access to new instance types. I also understand that there are fraudsters around and that companies want to protect themselves. But none of this excuses or explains:

  • Not telling your customers about the limitation
  • Not telling your own support staff about the limitation
  • Allowing customers to complete the entire onboarding process, including selecting a GPU instance
  • Making a PR fanfare about your product being available, but in practice (and in secret) only making it available to your established customers (indeed, why make a marketing splash about something that those people that see the marketing can’t actually use?!?)
  • The totally bizarre responses that requests received. For instance, my co-instructor’s request (who in her request included a link to the course and her linkedin, and who has a Duke math PhD, worked as a quant, and was a data scientist at Uber) was denied, whereas some students who provided no justification were accepted, on the same day!
  • Why some of our students, who were fully paid-up, suddenly found their access cut off in the middle of the course

I should also say that the support and capacity planning folks at both AWS and Azure have been tenacious in trying to find a way to solve this problem. Although neither company responded to my tweets informing them about the issue, both companies did respond to support tickets (although in both cases it required me to educate them about their own system’s limitations). They’re looking for a solution as I post this. Hopefully with broader awareness of this issue, and of the impact it has on those looking to get into deep learning for the first time, they will get the resources they need to fix it.

A plea: If you are from Amazon or Microsoft, or know anyone in a position of power there, could you please pass this on to them and ask them to help us? We’re looking for a way that our students can pay them money for GPU access! Our email address is info@fast.ai