fast.ai in the news:
01 Mar 2017 Rachel Thomas
This post has been translated into Chinese here.
I sometimes receive emails asking for guidance related to data science, which I answer here as a data science advice column. If you have a data science related quandary, email me at firstname.lastname@example.org. Note that questions are edited for clarity and brevity. Other installments of the data science advice column include:
Q: This question is a composite of a few emails I’ve received from people with some limited programming skills, living outside the Bay Area, and interested in becoming data scientists, such as the following:
Q1. I’m a financial analyst at a major bank. I’m in the process of pivoting to tech as a software engineer and I’m interested in machine learning, which lead me to your post on The Diversity Crisis in AI. Do I need a masters or PhD to work in AI?
Q2. I am in the 6th year of my PhD in pure mathematics and am going to graduate soon. I am really interested in data science and I want to know if I want to get a job in this area, what can I do or what should I prepare myself so that I can have the skill sets that the companies need? I’m now reading books and trying to find some side projects that I can do. Do you have any ideas where I can find these projects that will interest employers?
Q3. I have a graduate STEM degree and have worked as both a researcher and a teacher. I am currently in the midst of a career transition and looking for industry roles that might need both analytical and instructional skills from their employees. My knowledge is more on the science side though rather than in software. The Internet can get to be a pretty overwhelming place to pull information from without the actual sharing. Do you have recommendations for programming courses and workshops that are also friendly to a teacher’s budget? And what coding languages or skills would you say would be most helpful to focus on developing?
A: I think of myself as having a somewhat non-traditional background. At first glance it may seem like I have a classic data science education: I took 2 years of C++ in high school, minored in computer science in college (with a math major), did a PhD related to probability, and worked as a quant. However, my computer science coursework was mostly theoretical, my math thesis was entirely theoretical (no computations at all!), and over the years I used less and less C/C++ and more and more MATLAB (why oh why did I do that to myself?!? Somehow I found myself even writing web scrapers in MATLAB…) My college education taught me how to prove if an algorithm was NP-complete or Turing computable, but nothing about testing, version control, web apps, or how the internet works. The company where I was a quant primarily used proprietary software/languages that aren’t used in the tech industry.
After 2 years working as a quant in energy trading, I realized that my favorite parts of the job were programming and working with data. I was frustrated with the bureacracy of working at a large company, and of dealing with outdated and proprietary software tools. I wanted something different, and decided to attend the data science conference Strata in February 2012 to learn more about the world of Bay Area data science. I was totally, absolutely, blown away. The huge enthusiasm for data, all the tools I was most excited about (and several more I’d never heard of before), the stories of people who had quit their previous lives in academia or established companies to work on their passion at a startup… it was so different and refreshing to what I was used to. After Strata, I spent a few extra days in San Francisco, interviewing at startups and having coffee with some distant acquaintances that I found had moved to SF - everyone was very helpful and also apparently addicted to Four Barrel Coffee (almost everyone I spoke with suggested meeting there!)
I was star-struck, but actually switching into tech made me feel totally out of it… For my first many conversations and interviews with people in tech, I often felt like they were speaking another language. I grew up in Texas, spent my 20s in Pennsylvania and North Carolina, and didn’t know anyone who worked in tech. I’d never taken a statistics class and just thought of probability as real analysis on a space of measure 1. I didn’t know anything about how start-ups and tech companies worked. The first time I interviewed at a start-up, one interviewer boasted about how the company had briefly achieved profitability before embarking on rapid expansion/hiring. “You mean this company isn’t profitable!?!?” I responded in horror (Yes, I actually said that out loud, in a shocked tone of voice). I now cringe in embarrassment at the memory. In another interview, I was so confused by the concept of an “impression” (when an internet ad is displayed) that it took me a while to even get to the logic of the question.
I’ve been here five years now, and here’s some things I wish I’d known when I was starting my career move. I’m aware that I’m white, a US citizen, had a generous fellowship in grad school and no student debt, and was single and childless at the time I decided to switch careers, and someone without these privileges will face a much tougher path. While my anecdotes should be taken with a grain of salt, I hope that some of these suggestions turn out to be helpful to you:
Becoming ready for a move to data science
Most importantly: find ways to work whatever you want to learn into your current job. Find a project that involves more coding/data analysis and that would be helpful to your employer. Take any boring task you do and try to automate it. Even if the process of automation makes it take 5x as long (and even if you only do the task once!), you are learning by doing this.
Analyze any data you have: from research for an upcoming purchase (i.e. deciding which microwave to buy), data from a personal fitness tracker, nutrition data from recipes you’re cooking, pre-schools you’re looking at for your child. Turn it into a mini-data analysis project and write it up in a blog post. E.g. if you are a graduate student, you could analyze grade data from the students you are teaching
Learn the most important data science software tools: Python’s data science stack (pandas/numpy/scipy) is the #1 most useful technology to learn (read this book!), followed closely by SQL. I would focus on getting very comfortable with Python and SQL before learning other languages. Python is widely used and flexible. You will be well-positioned if you decide to switch to more software development work or to go full-steam into machine learning.
Use Kaggle. Do the tutorials, participate in the forums, enter a competition (don’t worry about where you place - just focus on doing a little better every day). It’s the best way to learn practical machine skills.
Search for data science and tech meetups in your area. With the explosion of data science in the last few years, there are now meetups in countries all over the world and in a wide variety of cities. For instance, Google recently held a TensorFlow Dev Summit in Mountain View, CA, but there were viewing parties around the world that watched the livestream together (including in Abuja, Nigeria, Coimbatore, India, and Rabat, Morocco).
Online courses are an amazing resource. You can learn from the world’s best data scientists in the comfort of your own home. Often the assignments are where most of the learning occurs, so don’t skip them! Here’s a few of my favorites:
As one of the questioners highlighted above, the amount of information, tutorials, and courses available online can be overwhelming. One of the biggest risks is jumping from thing to thing, without ever completing one or sticking with a topic long enough to learn it. It’s important to find a course or project that is “good enough”, and then stick with it. Something that can be helpful with this is finding or starting a meet-up group to work through an online course together.
Online courses are very useful for the knowledge you gain (and it’s so important that you do all the assignments, as that is how you learn). However, I have not seen any benefit to getting the certificates from them (yet– I know this is a newer area of growth). This is based on my experience as having interviewed a ton of job applicants when hiring data scientists, and having interviewed for many positions myself.
- Twitter can be a surprisingly helpful way to find interesting articles and opportunities. For instance, my collaborator Jeremy Howard has provided over 1,000 links to his favorite machine learning papers and blog posts (NB: you’ll need to be signed in to Twitter to read this link). It will take some time to figure out who to follow (and may involve some following and unfollowing and searching along the way), although one short cut is to look at who wrote the tweets you like in the link above, and follow them directly. Look up data scientists at companies that interest you. Look up the authors of libraries and tools that you use, or are interested in. When you find a tutorial or blog post you like, look up the author. And then look up who these people retweet. If you are unsure what or how to tweet, I think it can be helpful to think of Twitter as a way to (publicly) bookmark links that you like. I try to tweet any article or tutorial that I think I may want to reference back to in a few months time.
- The machine learning subreddit is a great source of recent news. You may find a lot of it inaccessible at first, but after a couple of months you’ll start seeing more and more that you recognize
- It’s helpful to sign up for newsletters such as Import AI newsletter and WildML news
Moving to the Bay Area
Do whatever you can to move to the Bay Area! I realize that this won’t be possible for many people (particularly if you have children or for a variety of visa/legal residency issues). There are so many data science meet-ups, study groups, conferences, and workshops here. There is also an amazing community of other bright, ambitious, hungry-to-learn data scientists. I had trouble even figuring out which were the most useful things for me to learn from afar. Although I’d started studying machine learning on my own before moving here, coming to SF rapidly accelerated my learning.
My first year in San Francisco was a period of intense learning for me: I attended tons of meetups, completed several online courses, participated in numerous workshops and conferences, learned a lot by working at a data-focused start-up, and most importantly met scores of people who I was able to ask questions of. I completely under-estimated how amazing it is to be able to interact regularly with the people who are building the tools and technology that excite me most. I’m surrounded by people who love learning and are pushing the cutting edge of what is possible. That TensorFlow Dev Summit I mentioned above that people watched from around the world? I was lucky enough to be able to attend it live, and my favorite part was the people I met.
One good approach to moving here is taking a “not-your-dream-job”; i.e. try to get to a place where you’re surrounded by people you can learn from, even if it’s not a role you’d otherwise be interested in. I decided to switch careers in early 2012, before Insight or other data science bootcamps existed. I applied to a few of what were my “dream jobs” at the time and was rejected. In hindsight, I think this was a mix of my lacking some needed skills, not knowing how to market myself properly, and doing a fairly brief job search. In March 2012, I accepted an analyst position at a start-up I was excited about, with the hope and an informal agreement that I could move into an official data science/modeling role later. Overall, it was a good choice. It let me move to San Francisco rapidly, the company I joined was great in a number of ways (including having a weekly reading group that was working through Bishop’s Pattern Recognition and including me on a field trip to meet Trevor Hastie and Jerome Friedman), and my manager was supportive of me doing more engineering-intensive projects than the role was officially scoped for. One year later, I landed what on paper was my dream job: a combined data scientist/software engineer role with a startup that had fascinating datasets.
There are also some good bootcamps in the area, which generally also provide many opportunities to connect with interesting people and companies in the data science space.
- Insight Data Science is a 7-week, free, intensive bootcamp for graduates with PhDs in STEM fields. Potential downsides: Since it is only 7 weeks, part of which is focused on networking and job search, I believe it’s mostly for people who already have most of the skills they need. Also, it’s very competitive to get into.
- Data Science boot camps such as Galvanize or Metis. Positives: These are 12-week immersive experiences, that provide structure and networking opportunities. Downsides: These are rather expensive. Some factors to consider: How close is your background to what you need? That is, if you have little programming experience, it may be necessary to do something like this, but if you are transitioning from a closely related field, this may be overkill. Also, how motivated are you with independent self-learning? If you struggle with it, the accountability and structure of a bootcamp could be helpful.
There are many factors in deciding whether to do a bootcamp. A big one is how much structure or external motivation you need. There are a lot of amazing resources available online. How much discipline do you have? Note that it’s important to accept what you need to best learn. I find the motivation of online courses and having assignments really works for me, and I used to feel embarrassed that this was easier for me than having a completely independent side project. Now, I’ve accepted this and try to work with it. Other questions to ask: how much do you need to learn, and how quickly can you learn on your own? If it’s a lot, a bootcamp may really speed that up. One area where I think bootcamps can particularly shine is teaching you how to put a bunch of different tools/techniques together.
You can also move here without a job. This requires a number of things, including: ample savings, US legal residency status, and not having children, so it won’t be an option for many people. However, if you are able to do it (i.e. a US permanent resident, coming from finance), it can be a good option. Searching for a job in tech can be a full-time job, as data science and engineering interviews require a lot of studying to prepare and many require time-intensive take-home challenges. In hindsight, I’ve often done rushed job searches when I was working full-time and job-searching, and that has lead me to a few sub-optimal decisions. You will certainly find plenty of ways to fill your time with studying for interviews, coding side projects, and attending workshops and study groups. Also, two things that surprised me when I switched to tech are how frequently people switch jobs, and how normal it is to take time off between jobs for learning new things or traveling (so there is less reason to worry about gaps in your resume, as long as you have good answers about what you were learning during that time).
The huge caveat: I was unaware of how sexist, racist, ageist, transphobic, and morally bankrupt Bay Area tech is (despite its grandiose claims to be creating a better future) 5 years ago when I made my move. A few years later, I became so discouraged that I considered leaving the tech industry altogether. Tales of betrayal, callousness, and cruelty abound: for instance, someone I’m close to had his family medical emergencies exploited for profit by his coworkers, and many of my friends and loved ones have had similarly awful experiences. However, the community of passionate, fascinating people and access to cutting edge technology keeps me here, and given the choice, I’d choose to move here all over again. I currently feel very lucky with fast.ai to be working on the problems that I find most interesting and believe will have the greatest impact.
Do I need a masters or PhD to work in AI? I firmly believe the answer is NO, and I’m working to make that even more of a reality than it already is. In fact, AI PhDs are often not that well-positioned to tackle practical, relevant business problems, because that is not what their training is for. Academia is focused on advancing the theoretical bounds of the field, and is motivated by what can be published in top journals (very different from what will create a viable business!). Read more about fast.ai’s educational philosophy here and check out our free, online course Practical Deep Learning for Coders.
Should I learn Ruby after learning Python? There’s no reason for an aspiring data scientist to learn Ruby. It’s similar enough to Python that it won’t teach you new concepts (the way learning a functional language or lower-level language would), and there is not much of a data science eco-system.
Where can I find side projects that will interest employers? I think I can find random data sets online, but I guess employers do want to see how I handle a real situation? Don’t feel like your side project needs to be completely unique, or involve a unique or unusual data set. It’s fine to use a dataset you get from Kaggle for a side project. It’s fine if your project isn’t ground-breaking. When creating side projects, blog posts, or tutorials, think of your audience as the person who is one-step behind you, as they are the one who you are best positioned to help. You may be worried that a project or post wouldn’t be interesting to someone senior in the field, or that someone else may have done something similar. That’s fine! It’s good just to get your work out there.
27 Feb 2017 Rachel Thomas
Recent American news events range from horrifying to dystopian, but reading the applications of our fast.ai international fellows brought me joy and optimism. I was blown away by how many bright, creative, resourceful folks from all over the world are applying deep learning to tackle a variety of meaningful and interesting problems. Their passions range from ending illegal logging, diagnosing malaria in rural Uganda, translating Japanese manga, reducing farmer suicides in India via better loans, making Nigerian fashion recommendations, monitoring patients with Parkinson’s disease, and more. Our mission at fast.ai is to make deep learning accessible to people from varied backgrounds outside of elite institutions, who are tackling problems in meaningful but low-resource areas, far from mainstream deep learning research.
Our group of selected fellows for Deep Learning Part 2 includes people from Nigeria, Ivory Coast, South Africa, Pakistan, Bangladesh, India, Singapore, Israel, Canada, Spain, Germany, France, Poland, Russia, and Turkey. We wanted to introduce just a few of our international fellows to you today.
Tahsin Mayeesha is a Bangladeshi student who created a network visualization project analyzing data from a prominent Bangladeshi newspaper to explore the media coverage of violence against women. She wrote up her methodology and findings here, and hopes that the project can increase knowledge and empathy. In working on her Udacity Machine Learning Nano-degree, she overcame the challenges of a broken generator and intermittent electricity during Ramadan to successfully complete her projects. Mayeesha is a fan of Naruto, a popular Japanese manga series, and would like to use deep learning to translate it into English. Naruto characters use different stylized hand signs for different fight moves, and she is interested in trying to recognize these with a CNN. On a broader scale, Mayeesha wants to explore the question of how ML will impact places like Bangladesh with a semi-broken infrastructure.
Karthik Mahadevan, an industrial designer in Amsterdam, previously created an interactive medical toy for children with cancer. More recently, he helped develop smart diagnosis solutions for rural health centres in Uganda. His team developed a smartphone-based device that captures magnified images of blood smear of malaria patients. The images are processed through an AI-based software that highlights potential parasites in the image for lab technicians to check. The long-term aim, however, is to create a fully automated diagnosis system to compensate for the shortage of lab technicians in rural Uganda (84% of the population of Uganda lives in rural areas.)
After being selected as our first international fellow, and completing part 1 of our course, language researcher Samar Haider of Pakistan collected the largest dataset ever of his native language of Urdu. He says he was inspired by Lesson 5 of Part 1 to acquire, clean, and segment into sentences an Urdu corpus with over 150 million tokens. Haider trained a model to learn vector representations of the words, which captured useful semantic relationships and lexical variations. Haider writes “this marks the first time such word representations have been trained for Urdu, and, while they are themselves an incredibly valuable resource, it is exciting to think of ways in which they can be used to advance the state of natural language processing for Urdu in applications ranging from text classification to sentiment analysis to machine translation.” Haider will be joining us again for Part 2 and says, “In the long run, I hope to use deep learning techniques to bridge gaps in human communication (an especially important duty in these polarizing times) by helping computers better process and understand regional languages and helping unlock a world of information for people who don’t speak English or other popular languages.”
Xinxin Li previously developed carbon management technologies as an environmental research engineer, and built a Python app to diagnose plant diseases through photos of leaves. She is now working with a wearable technology company to develop a system for Parkinson’s patient therapy management, the core of which is a machine learning model to be trained with clinical trial data. This new system would enable a doctor to gauge patients’ symptoms, such as tremors and dyskinetic, via sensor data collected out of clinic, rather than relying on written diaries or interviews with patient caregivers.
Sahil Singla works at a social impact startup in India, using deep learning on satellite imagery to help the Indian government identify which villages have problems of landlessness or crop failure. Singla plans to use deep learning to build better crop insurance and agriculture lending models, thus reducing farmer suicides (exorbitant interest rates on predatory loans contribute to the high suicide rate of Indian farmers).
Amy Xiao, an undergraduate at the University of Toronto, plans to create tools such as browser extensions to help people distinguish between facts and fiction in online information. Her goal is to rate the legitimacy of online content via a deep learning model by integrating sentiment analysis of the comments, legitimacy of news source, and the content itself, trained on labeled articles with a “predetermined” score. She is also interested in exploring how to discern legitimate vs. fake reviews from online sites.
Prabu Ravindran is developing a deep learning system for automated wood identification in the Center for Wood Anatomy Research, Forest Products Laboratory, and the U of Wisconsin Botany Department. This system will be deployed to combat illegal logging and identify wood products. Orlando Adeyemi, a Nigerian currently working in Malaysia, has already begun scraping Nigerian fashion websites for data that he plans to apply deep learning to. Previously, he created an iOS app for Malaysian cinema times, and won an award for his arduino wheelchair stabilizer.
Gurumoorthy C is excited about the “Clean India” initiative launched by Prime Minister Modi. Together with a group of friends, Gurumoorthy plans to create a small robot to pick up trash in the street, and correctly identify waste. Karthik Kannan is currently working on a idea that incorporates deep learning and wearable cameras to help the visually impaired navigate closed spaces in India.
Alexis Fortin-Cote is a PhD student in robotics at U Laval from French-speaking Quebec. He plans to create a model capable of inferring the level of fun players are experiencing from video games, using bio sensor information and self-reported emotional state. Together with a team from the school of psychology, he has already collected over 400 total hours of data from 200 players.
We welcome the above fellows, along with the rest of our international fellows and our in-person participants to Deep Learning Part 2. We are very excited about this community, and what we can build together!
24 Feb 2017 Jeremy Howard & Rachel Thomas
We’ve been so excited to watch the thousands of people working their way through part 1 of Practical Deep Learning For Coders, and the buzzing community that has formed around the course’s Deep Learning Discussion Forums. But we have heard from those for whom English is not their first language that a major impediment to understanding the content is the lack of written transcript or course notes. As a student of Chinese I very much empathize - I find it far easier to understand Chinese videos when they have subtitles, especially when it’s more technical material.
So I’m very happy to announce that the course now has complete transcripts (available directly as captions within Youtube) and course notes for every lesson.
This is thanks to the hard work of our intern, Brad Kenstler (course notes), and part 1 international fellow, Lin Crampton (transcripts). We are very grateful to both of them for their wonderful contributions, and we expect that they will significantly progress our mission to make the power of deep learning accessible to all.
28 Jan 2017 Rachel Thomas
Applications are now open for Deep Learning Part 2, to be offered at the University of San Francisco Data Institute on Monday evenings, Feb 27-April 10. The course will cover integrating multiple cutting-edge deep learning techniques, as well as combining classic machine learning techniques with deep learning.
In part 1, we worked hard to curate a diverse group of participants, because we’d observed that artificial intelligence is missing out because of its lack of diversity. A study of 366 companies found that ethnically diverse companies are 35% more likely to perform well financially, and teams with more women perform better on collective intelligence tests. Scientific papers written by diverse teams receive more citations and have higher impact factors.
Everyone benefited from having a class full of curious coders from a variety of backgrounds. We had a number of students interested in using deep learning for social good, including Sara Hooker, founder and executive director of Delta Analytics, which partners non-profits with data scientists. She is now working on a project to use audio data streamed from recycled cell phones in endangered forests to track harmful human activity. Another student was a former Literature PhD student interested in analyzing gender and language in Github commits. Several students connected over their shared interest in Alzheimer’s research. International fellow Samar Haider is a researcher applying natural language processing to his native language of Urdu, one of the 70 different spoken languages in Pakistan, many of which have not been well-studied and are in need of the additional resources deep learning can provide. Another international fellow said he never expected to be using so many command line tools (we provide scripts and guidance to walk you through the setup) and he ended up creating an Amazon Machine Image which saved memory to share with the rest of the class.
One of the ways we achieved this great outcome was by, together with USF Data Institute, sponsoring diversity fellowships and international fellowships. It was such a success that we’ve decided to do it again.
I am saddened and angered that President Trump is banning immigrants from certain countries from entering the US, even when they have visas and green cards. The deep learning community is suffering from its lack of diversity already, and we are trying to fight that. We can’t change government policy at fast.ai, but we can do our little bit: we will again offer free remote international fellowships for those selected outside San Francisco to attend classes virtually, have access to all the same online resources, and be a part of our community. People of all religions and from all countries, including Iran, Iraq, Libya, Somalia, Syria, Sudan, and Yemen, are welcome and encouraged to apply.
Diversity fellowship are full or partial tuition waivers to attend the in-person course in San Francisco for women, people of Color, LGBTQ people, or veterans. We are looking for applicants who have shown the ability to follow through on projects and a significant level of intellectual curiosity.
International fellowships allow those who can not get to San Francisco to attend virtual classes for free during the same time period as the in-person class and provides access to all the same online resources. (Note that international fellowships do not provide an official completion certificate through USF). Our international fellows from part 1 contributed greatly to the community.
Both fellowships require completion of part 1. When applying, please let us know about any way that you have contributed to the student community (such as forum posts, pull requests, or open source projects). To apply, email your resume to email@example.com and firstname.lastname@example.org, along with a note of whether you are interested in the diversity or international fellowships and a brief paragraph on how you want to use deep learning. Note that to be eligible, you must have completed Deep Learning Part 1, either in person, or through our MOOC. Deep Learning Part 1 involves approximately 70 hours of work, so if you haven’t finished yet, you should get studying. The deadline to apply is 2/13.
17 Jan 2017 Jeremy Howard & Rachel Thomas
With part 2 of our in person SF course starting in 6 weeks, and applications having just opened, we figured we better tell you a bit about what to expect!… So here’s an overview of what we’re planning to cover.
The main theme of this part of the course will be tackling more complex problems, that require integrating a number of techniques. This includes both integrating multiple deep learning techniques (such as combining RNNs and CNNs for attentional models), as well as combining classic machine learning techniques with deep learning (such as using clustering and nearest neighbors for semi-supervised and zero-shot learning). As always, we’ll be introducing all methods in the context of solving end-to-end real world modeling problems, using Kaggle datasets where possible (so that we have a clear best-practice goal to aim for).
Since we have no pre-requisites for the course other than a year of coding experience and completion of part 1 of the course, we’ll be fully explaining all the classic ML techniques we’ll use as well.
In addition, we’ll be covering some more sophisticated extensions of the DL methods we’ve seen, such as adding memory to RNNs (e.g. for building question answering systems / “chat bots”), and multi-object segmentation and detection methods.
Some of the methods we’ll examine will be very recent research directions, including unpublished research we’ve done at fast.ai. So we’ll be looking at journal articles much more frequently in this part of the course—a key teaching goal for us is that you come away from the course feeling much more comfortable reading, understanding, and implementing research papers. We’ll be sharing some simple tricks that make it much easier to quickly scan and get the key insights from a paper.
Python 3 and Tensorflow
This part of the course will use Python 3 and Tensorflow, instead of Python 2 and Theano as used in part 1. We’ll explain our reasoning in more detail in a future post; we hope that you will come away from the course feeling confident in both of these tools, and able to identify the strengths and weaknesses of both, to help you decide what to use in your own projects.
We’ve found using Python 3 to develop the course materials quite a bit more pleasant than Python 2. Whilst version 3 of the language has provided some incremental improvements for many years, until recently we’ve found the lack of support for Python 3 in scientific computing libraries resulted in it being a very frustrating experience. The good news is that that’s all changed now, and furthermore recent developments in Python 3.4 and 3.5 have greatly improved the productivity of the language.
Our view of Tensorflow is that buried in a rather verbose and complex API there’s a very nice piece of software buried away in there. We’ll be showing how to write custom GPU accelerated algorithms from scratch in Tensorflow, staying within a small and simple subset of the Tensorflow API where things stay simple and elegant.
Structured data, time series analysis, and clustering
One area where deep learning has been almost entirely ignored is in the area of structured data analysis (i.e. analyzing data where each column represents a distinct feature, such as from a database table). We had wondered whether this is because deep learning is simply less well suited to this task than the very popular decision tree ensembles (such as random forests and XGBoost, which we’re big fans of), but we’ve recently done some research that has shown that deep learning can be both simpler and more effective than these techniques. But getting it to work well requires getting a lot of little details right—details that have never been fully understood or documented elsewhere to the best of our knowledge.
We’ll be showing how to get state of the art results in structured data analysis, including showing how to use the wonderful XGBoost, and comparing these techniques. We’ll also take a brief detour into looking at R, where structured data analysis is still quite a bit more straightforward than Python.
Most of the structured data sets we’ll investigate will have a significant time series component, so we’ll also be discussing the best ways to deal with this kind of data. Time series pop up everywhere, such as fraud and credit models (using time series of transactions), maintenance and operations (using time series of sensor readings), finance (technical indicators), medicine (medical sensors and EMR data), and so forth.
We will also begin our investigation of cluster analysis, showing how it can be combined with a softmax layer to create more accurate models. We will show how to implement this analysis from scratch in Tensorflow, creating a novel GPU accelerated algorithm.
Deep dive into computer vision
We will continue our investigation into computer vision applications from part 1, getting into some new techniques and new problem areas. We’ll study resnet and inception architectures in more detail, with a focus on how these architectures can be used for transfer learning. We’ll also look at more data augmentation techniques, such as test time augmentation, and occlusion.
We’ll learn about the K nearest neighbors algorithm, and use it in conjunction with CNNs to get state of the art results on multi-frame image sequence analysis (such as videos or photo sequences). From there, we will look at other ways of grouping objects using deep learning, such as siamese and triplet networks, which we will use to get state of the art results for image comparisons.
Unsupervised and semi-supervised learning, and productionizing models
In part 1 we studied pseudo-labeling and knowledge distillation for semi-supervised learning. In part 2 we’ll learn more techniques, including bayesian-inspired techniques such as variational autoencoder and variational ladder networks. We will also look at the role of generative models in semi-supervised learning.
We will show how to use unsupervised learning to build a useful photo fixing tool, which we’ll then turn into a simple web app in order to show how you can put deep learning models into production.
Zero-shot learning will be a particular focus, especially the recently developed problem of generalized zero-shot learning. Solving this problem allows us to build models on a subset of the full dataset, and apply those models to whole new classes that we haven’t seen before. This is important for real-world applications, where things can change and new types of data can appear any time, and where labeling can be expensive, slow, and/or hard to come by.
And don’t worry, we haven’t forgotten NLP! NLP is a great area to apply unsupervised and semi-supervised learning, and we will look at a number of interesting problems and techniques in this space, including how to use siamese and triplet networks for text analysis.
Segmentation, detection, and handling large datasets
Handling large datasets requires careful management of resources, and doing it in a reasonable time frame requires being thoughtful about the full modeling process. We will show how to build models on the well-known Imagenet dataset, and will show that analysing such a large dataset can readily be done on a single machine fairly quickly. We will discuss how to use your GPU, CPUs, RAM, SSD, and HDD together, taking advantage of each part most effectively.
Whereas most of our focus on computer vision so far has been classification, we’ll now move our focus to localization—that is, finding the objects in an image (or in NLP, finding the relevant parts of a document). We have looked at some simple heatmap and bounding box approaches in part 1 already; in part 2 we build on that to look at more complete segmentation systems, and methods for finding multiple objects in an image. We will look at the results of the recent COCO competition to understand the best approaches to these problems.
Neural machine translation
As recently covered by the New York Times, Google has totally revamped their Translate tool using deep learning. We will learn about what’s behind this system, and similar state of the art systems—including some more recent advances that haven’t yet found their way into Google’s tool.
We’ll start with looking at the original encoder-decoder model that neural machine translation is based on, and will discuss the various potential applications of this kind of sequence to sequence algorithm. We’ll then look at attentional models, including applications in computer vision (where they are useful for large and complex images). In addition, we will investigate stacking layers, both in the form of bidirectional layers, and deep RNN architectures.
Question answering and multi-modal models
Recently there has been a lot of hype about chatbots. Although in our opinion they’re not quite ready for prime time (which is why pretty much all production chatbots still have a large human element), it’s instructive to see how they’re built. In general, question answering systems are built using architectures that have an explicit memory; we will look at ways of representing that in a neural network, and see the impact it has on learning.
We will also look at building visual Q&A systems, where you allow the user to ask questions about an image. This will build on top of the work we did earlier on zero-shot learning.
Reinforcement learning has become very popular recently, with Google showing promising results in training robots to complete complex grasping actions, and DeepMind showing impressive results in playing computer games. We will survey the reinforcement learning field and attempt to identify the most promising application areas, including looking beyond the main academic areas of study (robots and games) to opportunities for reinforcement learning of more general use.
We hope to see you at the course! Part 1 was full, and part 2 is likely to be even more popular, so get your application in soon!