Our courses (all are free and have no ads):

fast.ai in the news:

How to Encourage Your Child's Interest in Science and Tech

This week’s Ask-A-Data-Scientist column is from a parent on how to encourage their child in STEAM. Please email your data science related quandaries to rachel@fast.ai. Previous posts include:

Q: My daughter loves math and art. She’s currently an 8th grader. My husband and I are not STEAM (Science, Technology, Engineering, Art, Math) people. I’d love to expose her to possible career options but am limited by my ignorance and perhaps my location. Do you have any suggestions for an intelligent, young person who is about to start her high school journey?

A: First, I am so glad you are encouraging your daughter’s interests! I have several recommendations and resources. This is a fantastic time in history to be a kid with an internet connection interested in math and art.

1. She should learn to code. In STEM, code is the language of creativity, and without knowing how to code, you are reliant on tools created by others. A good place to start is with blockly games, which teaches programming concepts (such as loops, variables, and logic) though a variety of mazes and puzzles. Blockly library was developed by the Google for Education team.

As for programming, I recommend that she start by learning JavaScript with Khan Academy’s Intro Javascript: Drawing & Animation and then move on to Advanced JavaScript: Games and Visualizations. Khan Academy has a great interface in which the code editor and picture show up side by side, and you can instantly see how the picture changes as you change the code.

Anyone who knows me may be suprised that I’m recommending Javascript first, since I didn’t learn it until just a few years ago and it’s never been the primary language I use. My reasons are this: Javascript is interactive, visual, and the language of the web. It is the best language for creating things that will interest her non-coder friends. Anytime you make an action on a website (such as clicking a button or moving your mouse) and something happens (an image changes, you go to a new page, new text appears) you have Javascript to thank for that. Note: JS is not just for web; it can also be used for mobile, backend, and embedded devices, so basically everything.

I really like that Khan Academy introduces Javascript through the applications of drawing and visualizations. My introduction to coding was a high school C++ course almost 20 years ago, where one of our first assignments was to print out credit card balances by month, varying based on what size payments were made and the interest rate. I still remember how delighted I was watching the table of numbers print to the terminal, but there are far more engaging intros now.

A note for parents of younger children: you might want to check out scratch (language for children developed by MIT Media Lab), snap (drag-and-drop programming language), or snap circuits (electronics kits).

2. The videos of 3 Blue 1 Brown have gorgeous visuals, and are well suited to visual thinkers and anyone that enjoys art and patterns. They are a fairly different perspective on math, and one I’d like to see more of. When watching them, don’t stress about understanding every concept, but try to just enjoy the beatuy. The video on the surprising relationship between binary/ternary counting, the Towers of Hanoi puzzle, and Sierpinski’s triangle may be a fun place to start.

Vi Hart’s Doodling in Math Class is also a fun and fantastic series.

3. A ton of exciting advances are happening in the maker space– people creating clothing that lights up, machines that 3d print pancakes, robots to move your Klein bottle collection around– and there are lots of resources available for all ages. Maker spaces are being added in libraries across the country, and can include anything from 3D Printers, littleBits, LEGO Robotics, Arduinos, Snap Circuits, design software, woodworking tools, jewelry making tools, paper crafting equipment, microscopes and other science gadgets, sewing machines, and more, and many offer workshops or classes. You can also see if there is a regional Maker Faire in your area.

Two of my favorite adult examples: I was in the audience when Kassandra Perch gave a delightful demo during her keynote at ForwardJS, a Javascript conference. She was wearing a belt covered in patterns of light and instructed the audience to tweet colors at her, and those colors were then displayed on the belt (from minutes 12-16 in this video– don’t be intimidated by the amount of code she shows, there are simpler light projects which require less code).

One of the students from our fast.ai course bought several tons of legos on ebay and constructed a machine to automatically sort the legos (old bulk lego is sold more cheaply, but the resale value for sorted Lego is much higher and can be quite lucrative for certain pieces). I want children to know that adults do things like create interactive colorful light-up clothing for the keynote speech at a professional conference, or construct machines to sort Legos in their free time. Both of these examples are by experts, but you do not need to be an expert to work with hardware or program an arduino.

4. Encourage her to start a blog about what she is learning, creating, and exploring. I recently wrote a post (inspired by a question from a college student) encouraging everyone to blog, and I think the advice certainly holds for high schoolers. Many schools relegate writing to the humanities and social sciences, and don’t give students the practice of writing about math and technology. Being able to write and communicate technical ideas clearly is a super important and useful skill in today’s world (art can help with this too!). As I said previously, a blog is like a resume, only better. This holds true for high school students as well, and could be useful in landing internships. Check out this post for tips on how to get started.

You can checkout the zines by Amy W (an MIT computer science grad who hacks knitting machines) or Julia Evans (an infrastructure engineer at credit processing startup Stripe) for great examples of how cartoons and sketches can illuminate technical concepts. They are also two women I deeply admire!

5. As for profiles of possible careers, Khan Academy has a series of career profiles.

Fast.ai students are using math and coding in a wide variety of interesting and meaningful ways such as: listening for chainsaw noises in endangered rainforests with recycled cell phones, diagnosing malaria in under-staffed Ugandan clinics, and reducing suicides of farmers in India.

6. Miscellaneous Groups and Resources. Although these are location specific, note that groups exist in a wide variety of places, not just in major tech hubs like San Francisco or New York City:

  • Iridescent Technovation: Through Technovation, teams of teenage girls around the world (from 78 different countries!) build mobile apps to solve problems in their communities, create business plans, and launch their solutions.
  • Black Girls Code: Introduces Black girls to coding and game design. They’ve reached over 3,000 students in cities such as Atlanta, Miama, LA, Dallas, Memphis, and others, and have plans to expand.
  • Blue 1647 offers a variety of programs including teaching youth to create web and mobile apps, Latina Girls Code, MineCraft Development bootcamps, programs for individuals with intellectual disabilities, and more. It has locations in Chicago, St. Louis, Compton, Indiana, Haiti, and LA.

7. There is a lovely essay called A Mathematician’s Lament written by Paul Lockhart, a former Brown University math professor who quit to teach K-12. He describes a nightmare world in which children are not allowed to sing songs or play instruments until they have spent over a decade studying music notation, transcribing sheet music by hand in different keys, and memorizing their circle of fifths. That sounds horrifying! Yet it is how math is taught in most schools– the focus is on dry notation, formal rules, memorization, and disconnected components, with the fun and creative parts saved until long after most students have dropped out.

I hope you can encourage your child to keep a sense of creativity, beauty, pattern, and play when approaching math. I know it can be difficult for children to maintain their curiosity and passion for subjects when adults or peers don’t understand their interests.

My daughter is still a toddler, so I haven’t gotten to experience this firsthand yet and I would love to hear from those of you who have! Also, a huge thanks to everyone who gave me suggestions for this article on Twitter.

Alternatives to a Degree to Prove Yourself in Deep Learning

This post has been translated into Chinese here.

This week’s Ask-A-Data-Scientist column has a question from a college freshman at my alma mater, Swarthmore. Please email your data science related quandaries to rachel@fast.ai. Note that questions are edited for clarity and brevity. Previous posts include:

Q: I’m currently a freshman at Swarthmore College and I’m really interested in machine learning and deep learning. I wanted to take Artificial Intelligence this semester; unfortunately, no freshmen got into the class as it has been difficult for the CS department to keep up with the huge spike in interest.

I’m currently taking Andrew Ng’s Coursera Course on Machine Learning and will finish it in ~2-3 weeks. Next, I was planning on taking your fast.ai MOOC, which I saw on hacker news.

I know you may be too busy, but can I ask you questions I have about ML and my proposed plan? How can I continue to learn machine learning after Ng’s Coursera course and fast.ai? It seems like the only two options are 1.) research and 2.) graduate level courses at UPenn (which seem to be quite difficult to get into from Swarthmore (especially as a first-year student)). Any advice would be appreciated.

A: In general, I am happy to answer questions, although it may take me some time (my inbox, oh my inbox). For technical questions, it’s best to first ask on our fast.ai forums. There are tons of interesting discussions on our forums, even if you are not taking our course. For career-related or general questions, I often answer them in my ask-a-data-scientist column.

  1. Even without Swarthmore or UPenn’s AI classes, you will never run out of things to do with deep learning or ways to learn more. Our MOOC takes 70 hours of study to complete, and if you get interested in any of the Kaggle competitions we have you start, you could spend much longer. We will be releasing Part 2 in a few months, which will be a similar time commitment, only with even more side avenues for further study, recommended papers to read, and ways to extend the work.

  2. Take the official classes when/if you are able, but you don’t need the credentials or resources from official classes (to anyone out there not in university or at a university that doesn’t offer an AI class, don’t worry: you don’t need them!). One of our students, who was an econ major with no graduate degree, was just accepted to the prestigious Google Brain residency program! Another student developed a new fraud detection technique based on material from our course and has received a bonus at his job. Several others have received internship and job offers, or switched teams in their current workplaces to more exciting machine learning projects.

Credentials can sometimes be useful to get your foot in the door, particularly if you are an underrepresented minority in tech (and thus facing greater scrutiny).

However, there are lots of even more effective ways to get your name and work out there:

  • Write a popular blog post (more on this below).
  • Create an interesting app and put it online.
  • Write helpful answers to others’ questions on the learn machine learning subreddit or on the fast.ai forums. Altruism is important to me, but that’s not why I recommend helping others. Explaining something you’ve learned to someone else is a key part of solidifying your own understanding.
  • Do your own experiments, and share the results via a blogpost or github. One of our students, Slav Ivanov, asked about using different optimizers for style transfer. Jeremy suggested he try it out, and Slav wrote an excellent blog post on what he found. This post was very popular on reddit and made Slav’s work more widely known.
  • Contribute to open source. Here, one of our students shares about his positive experience contributing to TensorFlow. With 3 lines of code, he reduced the binary size of TensorFlow on Android to less than 10MB!

In general, I recommend that you start a side project of something that interests you (that uses deep learning) so you will have that to work on.

Why you (yes, you) should blog

The top advice I would give my younger self would be to start blogging sooner. Here are some reasons to blog:

  • It’s like a resume, only better. I know of a few people who have had blog posts lead to job offers!
  • Helps you learn. Organizing knowledge always helps me synthesize my own ideas. One of the tests of whether you understand something is whether you can explain it to someone else. A blog post is a great way to do that.
  • I’ve gotten invitations to conferences and invitations to speak from my blog posts. I was invited to the TensorFlow Dev Summit (which was awesome!) for writing a blog post about how I don’t like TensorFlow.
  • Meet new people. I’ve met several people who have responded to blog posts I wrote.
  • Saves time. Any time you answer a question multiple times through email, you should turn it into a blog post, which makes it easier for you to share the next time someone asks.

To inspire you, here are some sample blog posts from students in part 2 of our course:

I enjoyed all of the above blog posts and also, I don’t think any of them are too intimidating. They’re meant to be accessible.

Tips for getting started blogging

Jeremy had been suggesting for years that I should start blogging, and I’d respond “I don’t have anything to say.” This wasn’t true. What I meant was that I didn’t feel confident, and I felt like the things I could write had already been written about by people with more expertise or better writing skills than me.

It turns out that is fine! Your posts don’t have to be earth-shattering or even novel to be read and shared. My writing skills were rather weak when I started (part of the reason I chose to study math and CS in college was because those courses requried the least amount of writing and also no labs), but my skills are improving with time.

Here are some more tips to help you start your first post:

  • Make a list of links to other blog posts, articles, or studies that you like, and write brief summaries or highlight what you particularly like about them. Part of my first blog post came from my making just such a list, because I couldn’t believe more people hadn’t read the posts and articles that I thought were awesome.
  • Summarize what you learned at a conference you attended, or in a class you are taking.
  • Any email you’ve written twice should be a blog post. Now, if I’m asked a question that I think someone else would also be interested in, I try to write it up.
  • Don’t be a perfectionist. I spent 9 months on my first blog post, it went viral, and I have repeatedly hit new lows in readership ever since then. One of my personal goals for 2017 is to post my writing quicker and not to obsess so much before I post, because it just builds up pressure and I end up writing less.
  • You are best positioned to help people one step behind you. The material is still fresh in your mind. Many experts have forgotten what it was like to be a beginner (or an intermediate) and have forgotten why the topic is hard to understand when you first hear it. The context of your particular background, your particular style, and your knowledge level will give a different twist to what you’re writing about.
  • What would have helped you a year ago? What would have helped you a week ago?
  • If you are a woman in NYC, Chicago, or San Francisco, I recommend joining your local chapter of Write/Speak/Code, a group that encourages women software developers to write blog posts, speak at conferences, and contribute to open source.
  • Get angry. The catalyst that finally got me to start writing was when someone famous said something that made me angry. So angry that I had to explain all the ways his thinking was wrong.
  • If you’re wondering about the actual logistics, Medium makes it super simple to get started. Another option is to use Jekyll and Github pages. I can personally recommend both, as I have 2 blogs and use one for each.

You are on the right path by taking MOOCs, and by adding in a side project, involvement in online communities, and blogging you will have even more opportunities to learn and meet others!

To become a data scientist, focus on coding

This week’s Ask-A-Data-Scientist column answers two short questions from students. Please email your data science related quandaries to rachel@fast.ai. Note that questions are edited for clarity and brevity. Previous posts include:

Q1: I have a BS and MS in aerospace engineering and have been accepted to a data science bootcamp for this summer. I have been spending 15 hours/week on MIT’s 6.041 edx.org probability course, which is the hardest math course I’ve ever taken. I feel like my time could be better spent elsewhere. What about teaching myself the concepts as needed on the job? Or maybe you could recommend certain areas of probability to focus on? I’d like to tackle a personal project (either dealing with fitness tracker data or bitcoin) and maybe put probability on the backburner for a bit.

A: It sounds like you already know the answer to this one: yes! your time could be better spent elsewhere.

Let your coding projects motivate what you do, and learn math on an as needed basis. There are 3 reasons this is a good approach:

  • For most people, the best motivation will be letting the problems you’re working on motivate your learning.
  • The real test of whether you understand something is whether you can use it and build with it. So the projects you’re working on are needed to cement your understanding.
  • By learning on an as-needed basis, you study what you actually need, and don’t waste time on topics that may end up being irrelevant.

The only exceptions: if you want to be a math professor or work at a think tank (for most of my math phd, my goal was to become a math professor, so I see the appeal, but I was also totally unaware at the time of the breadth of awesome and exciting jobs that use math). And sometimes you need to brush up on math for white-boarding interviews.

Q2: I am currently pursuing a Master’s degree in Data Science. I am not that advanced in programming and new to most of the concepts of machine learning & statistics. Data science is such a vast field so most of my friends advise me to concentrate on a specific branch. Right now I am trying everything and becoming a jack in all and ace at none. How can I approach this to find a specialty?

A: There is nothing wrong with being a jack of all trades in data science; in some ways, that is what it means to be a data scientist. As long as you are spending the vast majority of your time writing code for practical projects, you are on the right track.

My top priorities of things to focus on for aspiring data scientists:

  • Focus on Python (including Numpy, Pandas, and Jupyter notebooks).
  • Try to focus on 1 main project. Extend something that you did in class. It can be difficult if you are mostly doing scattered problem sets in a variety of areas. For self-learners, one of the risks is jumping around too much and starting scattered tutorials across a range of sites, but never going deep enough with any one thing. Pick 1 Kaggle competition, personal project, or extension of a school project and stick with it. I can think of a few times I continued extended a class project for months after the class ended, because I was so absorbed in it. This is a great way to learn.
  • Start with decision tree ensembles (random forests and gradient boosting machines) on structured data sets. I have deeply conflicted feelings on this topic. While it’s possible to do these in Python using sklearn, I think R still handles structured datasets and categorical variables better. However, if you are only going to master one language, I think Python is the clear choice, and most people can’t focus on learning 2 new languages at the same time.
  • Then move on to deep learning using the Python library Keras. To quote Andrew Ng, deep learning is “the new electricity” and a very exciting, high impact area to be working in.

In terms of tips, there are a few things you can skip since they aren’t widely used in practice, such as support vector machines/kernel methods, Bayesian methods, and theoretical math (unless it’s explicitly necessary for a practical project you are working on).

Note that this answer is geared towards data scientists and not data engineers. Data engineers put algorithms into production and have a different set of skills, such as Spark and HDFS.

Machine learning hasn't been commoditized yet, but that doesn't mean you need a PhD

This post has been translated into Chinese here.

Here’s the latest installment of my Ask-A-Data-Scientist advice column. Please email your data science related quandaries to rachel@fast.ai. Note that questions are edited for clarity and brevity. Other posts include:

In the last week I received two questions with diametrically opposed premises: one was excited that machine learning is now automated, the other was concerned that machine learning takes too many years of study. Here are the questions:

Q1: I heard that Google Cloud announced that entrepreneurs can easily and quickly build on top of ML/NLP APIs. Is this statement true: “The future of ML and data post Google Cloud - the future is here, NLP and speech advancements have been figured out by Google and are accessible by API. The secret sauce has been commoditized so you can build your secret sauce on top of it. The time to secret sauce is getting shorter to shorter”?

Q2: Is it true that in order to work in machine learning, you need a PhD in the field? Is it true that before you can even begin studying machine learning, you must start by studying math, take “boring university level full length courses in calculus, linear algebra, and probability/statistics, and then learn C/C++ and parallel and distributed programming (CUDA, MPI, OpenMP, etc). According to this top rated comment on a Hacker News post, even after doing all that, we must then implement Machine Learning algorithms from scratch first in plain C, next in MPI or CUDA, and then in Numpy, before implementing them in Theano or TensorFlow.

A: It’s totally understandable that many people are having trouble navigating the hype, and the “AI is an exclusive tool for geniuses” warnings. AI is a hard topic for journalists to cover, and sadly many misrepresentations are spread. See for instance this great post for a recent case study of how DeepCoder was misconstrued in the media.

The answer to both of these questions is: NO. On the surface, they sound like opposite extremes. However, they have a common thread–many of those working in machine learning have an interest in either:

  1. Convincing you to buy their general purpose machine learning API (none of which have been good for anything other than getting acqui-hired).
  2. Convincing you that what they’re doing is so complicated, hard, and exclusive, that us mere mortals have no chance of understanding their sorcery. (This is such a common theme that recently a reddit parody of it was voted to the top of the machine learning page: A super harsh guide to machine learning)

Yes, advancements in machine learning are coming rapidly, but for now, you need to be able to code to effectively use the technology. We’ve found from our free online course Practical Deep Learning for Coders that it takes about 70 hours of study to become an effective deep learning practitioner.

Why “Machine Learning As A Service” (MLaaS) is such a disappointment in practice

A general purpose machine learning API seems like a great idea, but the technology is simply not there yet. Existing APIs are too overly specified to be widely useful, or attempt to be very general and have unacceptably poor performance. I agree with Bradford Cross, former founder of Flightcaster and Prismatic and partner at Data Collective VC, who recently wrote about the failure of many AI companies to try to build products that customers need and would pay for: “It’s the attitude that those working in and around AI are now responsible for shepherding all human progress just because we’re working on something that matters. This haze of hubris blinds people to the fact that they are stuck in an echo chamber where everyone is talking about the tech trend rather than the customer needs and the economics of the businesses.” (emphasis mine)

Cross continues, “Machine Learning as a Service is an idea we’ve been seeing for nearly 10 years and it’s been failing the whole time. The bottom line on why it doesn’t work: the people that know what they’re doing just use open source, and the people that don’t will not get anything to work, ever, even with APIs. Many very smart friends have fallen into this tarpit. Those who’ve been gobbled up by bigcos as a way to beef up ML teams include Alchemy API by IBM, Saffron by Intel, and Metamind by Salesforce. Nevertheless, the allure of easy money from sticking an ML model up behind an API function doesn’t fail to continue attracting lost souls. Amazon, Google, and Microsoft are all trying to sell an MLaaS layer as a component of their cloud strategy. I’ve yet to see startups or big companies use these APIs in the wild, and I see a lot of AI usage in the wild so its doubtful that its due to the small sample size of my observations.”

Is Google Cloud the answer?

Google is very poorly positioned to help democratize the field of deep learning. It’s not because of bad intentions– it’s just that they have way too many servers, way too much cash, and way too much data to appreciate the challenges the majority of the world faces in how to make the most of limited GPUs, on a limited budget (those AWS bills add up quickly!), and with limited size data sets. Google Brain is so deeply technical as to be out of touch with the average coder.

For instance, TensorFlow is a low level language, but Google seemed unaware of this when they released it and in how they marketed it. The designers of TensorFlow could have used a more standard Object-Oriented approach (like the excellent PyTorch), but instead they kept with the fine Google tradition of inventing new conventions just for Google.

So if Google can’t even design a library that is easily usable by sophisticated data scientists, how likely is it that they can create something that regular people can use to solve their real-world problems?

The Hacker News plan: “Implement algorithms in plain C, then CUDA, and finally plain Numpy/MATLAB”

Why do Hacker News contributors regularly give such awful advice on machine learning? While the theory behind machine learning draws on a lot of advanced math, that is very different from the practical knowledge needed to use machine learning in practice. As a math PhD, knowing the math has been less helpful than you might expect in building practical, working models.

The line of thinking espoused in that Hacker News comment is harmful for a number of reasons:

  • It’s totally wrong
  • Good education motivates the study of underlying concepts. To borrow an analogy from Paul Lockhart’s Mathmatician’s Lament, kids would quit music if you made them study music theory for years before they were ever allowed to sing or touch an instrument
  • Good education doesn’t overly complicate the material. If you truly understand something, you can explain it in an accessible way. After weeks of work, in Practical Deep Learning for Coders, Jeremy Howard implemented different modern optimization techniques (often considered a complex topic) in Excel to make it clearer how they work.

As I wrote a few months ago, it is “far better to take a domain expert within your organization and teach them deep learning, than it is to take a deep learning expert and throw them into your organization. Deep learning PhD graduates are very unlikely to have the wide range of relevent experiences that you value in your most effective employees, and are much more likely to be interested in solving fun engineering problems, instead of keeping a razor-sharp focus on the most commercially important problems.

“In our experiences across many industries and many years of applying machine learning to a range of problems, we’ve consistently seen organizations under-appreciate and under invest in their existing in-house talent. In the days of the big data fad, this meant companies spent their money on external consultants. And in these days of the false ‘deep learning exclusivity’ meme, it means searching for those unicorn deep learning experts, often including paying vastly inflated sums for failing deep learning startups.”

Cutting through the hype (when you’re not an ML researcher)

Computational linguist Dan Simonson wrote a handy guide of questions for the to ask to evaluate NLP, ML, and AI and identify snake oil:

  • Is there existing training data? If not, how do they plan on getting it?
  • Do they have an evaluation procedure built into their application development process?
  • Does their proposed application rely on unprecedentedly high performance on specific AI components?
  • Do the proposed solutions rely on attested, reliable phenomena?
  • If using pre-packaged AI components, do they have a clear plan on how they will go from using those components to having meaningful application output?

As an NLP researcher, Simonson is excited about the current advances in AI, but points out that the whole field is harmed when people exploit the gap in knowledge between practiotioners and the public.

Deep learning researcher Stephen Merity (of Salesforce/Metamind) has an aptly titled post It’s ML, not magic: simple questions you should ask to help reduce AI hype. His questions include:

  • How much training data is required?
  • Can this work unsupervised (= without labelling the examples)?
  • Can the system predict out of vocabulary names? (i.e. Imagine if I said “My friend Rudinyard was mean to me” - many AI systems would never be able to answer “Who was mean to me?” as Rudinyard is out of its vocabulary)
  • How much does the accuracy fall as the input story gets longer?
  • How stable is the model’s performance over time?

Merity also provides the reminder that models are often evaluated on highly processed, contrived, or limited datasets that don’t accurately reflect the real data you are working with.

What does this mean for you?

If you are an aspiring machine learning practitioner: Good news! You don’t need a PhD, you don’t need to code algorithms from scratch in CUDA or MPI. If you have a year of coding experience, we recommend that you try Practical Deep Learning for Coders, or consider my additional advice about how to become a data scientist.

You work in tech and want to build a business that uses ML: Good news! You don’t need to hire one of those highly elusive, highly expensive AI PhDs away from OpenAI. Give your coders the resources and time they need to get up to speed. Focus on a specific domain (together with experts from that domain) and build a product that people in that domain need and could use.

How to change careers and become a data scientist - one quant's experience

This post has been translated into Chinese here.

I sometimes receive emails asking for guidance related to data science, which I answer here as a data science advice column. If you have a data science related quandary, email me at rachel@fast.ai. Note that questions are edited for clarity and brevity. Other installments of the data science advice column include:

Q: This question is a composite of a few emails I’ve received from people with some limited programming skills, living outside the Bay Area, and interested in becoming data scientists, such as the following:

Q1. I’m a financial analyst at a major bank. I’m in the process of pivoting to tech as a software engineer and I’m interested in machine learning, which lead me to your post on The Diversity Crisis in AI. Do I need a masters or PhD to work in AI?

Q2. I am in the 6th year of my PhD in pure mathematics and am going to graduate soon. I am really interested in data science and I want to know if I want to get a job in this area, what can I do or what should I prepare myself so that I can have the skill sets that the companies need? I’m now reading books and trying to find some side projects that I can do. Do you have any ideas where I can find these projects that will interest employers?

Q3. I have a graduate STEM degree and have worked as both a researcher and a teacher. I am currently in the midst of a career transition and looking for industry roles that might need both analytical and instructional skills from their employees. My knowledge is more on the science side though rather than in software. The Internet can get to be a pretty overwhelming place to pull information from without the actual sharing. Do you have recommendations for programming courses and workshops that are also friendly to a teacher’s budget? And what coding languages or skills would you say would be most helpful to focus on developing?

A: I think of myself as having a somewhat non-traditional background. At first glance it may seem like I have a classic data science education: I took 2 years of C++ in high school, minored in computer science in college (with a math major), did a PhD related to probability, and worked as a quant. However, my computer science coursework was mostly theoretical, my math thesis was entirely theoretical (no computations at all!), and over the years I used less and less C/C++ and more and more MATLAB (why oh why did I do that to myself?!? Somehow I found myself even writing web scrapers in MATLAB…) My college education taught me how to prove if an algorithm was NP-complete or Turing computable, but nothing about testing, version control, web apps, or how the internet works. The company where I was a quant primarily used proprietary software/languages that aren’t used in the tech industry.

After 2 years working as a quant in energy trading, I realized that my favorite parts of the job were programming and working with data. I was frustrated with the bureacracy of working at a large company, and of dealing with outdated and proprietary software tools. I wanted something different, and decided to attend the data science conference Strata in February 2012 to learn more about the world of Bay Area data science. I was totally, absolutely, blown away. The huge enthusiasm for data, all the tools I was most excited about (and several more I’d never heard of before), the stories of people who had quit their previous lives in academia or established companies to work on their passion at a startup… it was so different and refreshing to what I was used to. After Strata, I spent a few extra days in San Francisco, interviewing at startups and having coffee with some distant acquaintances that I found had moved to SF - everyone was very helpful and also apparently addicted to Four Barrel Coffee (almost everyone I spoke with suggested meeting there!)

I was star-struck, but actually switching into tech made me feel totally out of it… For my first many conversations and interviews with people in tech, I often felt like they were speaking another language. I grew up in Texas, spent my 20s in Pennsylvania and North Carolina, and didn’t know anyone who worked in tech. I’d never taken a statistics class and just thought of probability as real analysis on a space of measure 1. I didn’t know anything about how start-ups and tech companies worked. The first time I interviewed at a start-up, one interviewer boasted about how the company had briefly achieved profitability before embarking on rapid expansion/hiring. “You mean this company isn’t profitable!?!?” I responded in horror (Yes, I actually said that out loud, in a shocked tone of voice). I now cringe in embarrassment at the memory. In another interview, I was so confused by the concept of an “impression” (when an internet ad is displayed) that it took me a while to even get to the logic of the question.

I’ve been here five years now, and here’s some things I wish I’d known when I was starting my career move. I’m aware that I’m white, a US citizen, had a generous fellowship in grad school and no student debt, and was single and childless at the time I decided to switch careers, and someone without these privileges will face a much tougher path. While my anecdotes should be taken with a grain of salt, I hope that some of these suggestions turn out to be helpful to you:

Becoming ready for a move to data science

  1. Most importantly: find ways to work whatever you want to learn into your current job. Find a project that involves more coding/data analysis and that would be helpful to your employer. Take any boring task you do and try to automate it. Even if the process of automation makes it take 5x as long (and even if you only do the task once!), you are learning by doing this.

  2. Analyze any data you have: from research for an upcoming purchase (i.e. deciding which microwave to buy), data from a personal fitness tracker, nutrition data from recipes you’re cooking, pre-schools you’re looking at for your child. Turn it into a mini-data analysis project and write it up in a blog post. E.g. if you are a graduate student, you could analyze grade data from the students you are teaching

  3. Learn the most important data science software tools: Python’s data science stack (pandas/numpy/scipy) is the #1 most useful technology to learn (read this book!), followed closely by SQL. I would focus on getting very comfortable with Python and SQL before learning other languages. Python is widely used and flexible. You will be well-positioned if you decide to switch to more software development work or to go full-steam into machine learning.

  4. Use Kaggle. Do the tutorials, participate in the forums, enter a competition (don’t worry about where you place - just focus on doing a little better every day). It’s the best way to learn practical machine skills.

  5. Search for data science and tech meetups in your area. With the explosion of data science in the last few years, there are now meetups in countries all over the world and in a wide variety of cities. For instance, Google recently held a TensorFlow Dev Summit in Mountain View, CA, but there were viewing parties around the world that watched the livestream together (including in Abuja, Nigeria, Coimbatore, India, and Rabat, Morocco).

Online courses

Online courses are an amazing resource. You can learn from the world’s best data scientists in the comfort of your own home. Often the assignments are where most of the learning occurs, so don’t skip them! Here’s a few of my favorites:

As one of the questioners highlighted above, the amount of information, tutorials, and courses available online can be overwhelming. One of the biggest risks is jumping from thing to thing, without ever completing one or sticking with a topic long enough to learn it. It’s important to find a course or project that is “good enough”, and then stick with it. Something that can be helpful with this is finding or starting a meet-up group to work through an online course together.

Online courses are very useful for the knowledge you gain (and it’s so important that you do all the assignments, as that is how you learn). However, I have not seen any benefit to getting the certificates from them (yet– I know this is a newer area of growth). This is based on my experience as having interviewed a ton of job applicants when hiring data scientists, and having interviewed for many positions myself.

News sources

  • Twitter can be a surprisingly helpful way to find interesting articles and opportunities. For instance, my collaborator Jeremy Howard has provided over 1,000 links to his favorite machine learning papers and blog posts (NB: you’ll need to be signed in to Twitter to read this link). It will take some time to figure out who to follow (and may involve some following and unfollowing and searching along the way), although one short cut is to look at who wrote the tweets you like in the link above, and follow them directly. Look up data scientists at companies that interest you. Look up the authors of libraries and tools that you use, or are interested in. When you find a tutorial or blog post you like, look up the author. And then look up who these people retweet. If you are unsure what or how to tweet, I think it can be helpful to think of Twitter as a way to (publicly) bookmark links that you like. I try to tweet any article or tutorial that I think I may want to reference back to in a few months time.
  • The machine learning subreddit is a great source of recent news. You may find a lot of it inaccessible at first, but after a couple of months you’ll start seeing more and more that you recognize
  • It’s helpful to sign up for newsletters such as Import AI newsletter and WildML news

Moving to the Bay Area

Do whatever you can to move to the Bay Area! I realize that this won’t be possible for many people (particularly if you have children or for a variety of visa/legal residency issues). There are so many data science meet-ups, study groups, conferences, and workshops here. There is also an amazing community of other bright, ambitious, hungry-to-learn data scientists. I had trouble even figuring out which were the most useful things for me to learn from afar. Although I’d started studying machine learning on my own before moving here, coming to SF rapidly accelerated my learning.

My first year in San Francisco was a period of intense learning for me: I attended tons of meetups, completed several online courses, participated in numerous workshops and conferences, learned a lot by working at a data-focused start-up, and most importantly met scores of people who I was able to ask questions of. I completely under-estimated how amazing it is to be able to interact regularly with the people who are building the tools and technology that excite me most. I’m surrounded by people who love learning and are pushing the cutting edge of what is possible. That TensorFlow Dev Summit I mentioned above that people watched from around the world? I was lucky enough to be able to attend it live, and my favorite part was the people I met.

One good approach to moving here is taking a “not-your-dream-job”; i.e. try to get to a place where you’re surrounded by people you can learn from, even if it’s not a role you’d otherwise be interested in. I decided to switch careers in early 2012, before Insight or other data science bootcamps existed. I applied to a few of what were my “dream jobs” at the time and was rejected. In hindsight, I think this was a mix of my lacking some needed skills, not knowing how to market myself properly, and doing a fairly brief job search. In March 2012, I accepted an analyst position at a start-up I was excited about, with the hope and an informal agreement that I could move into an official data science/modeling role later. Overall, it was a good choice. It let me move to San Francisco rapidly, the company I joined was great in a number of ways (including having a weekly reading group that was working through Bishop’s Pattern Recognition and including me on a field trip to meet Trevor Hastie and Jerome Friedman), and my manager was supportive of me doing more engineering-intensive projects than the role was officially scoped for. One year later, I landed what on paper was my dream job: a combined data scientist/software engineer role with a startup that had fascinating datasets.

There are also some good bootcamps in the area, which generally also provide many opportunities to connect with interesting people and companies in the data science space.

  • Insight Data Science is a 7-week, free, intensive bootcamp for graduates with PhDs in STEM fields. Potential downsides: Since it is only 7 weeks, part of which is focused on networking and job search, I believe it’s mostly for people who already have most of the skills they need. Also, it’s very competitive to get into.
  • Data Science boot camps such as Galvanize or Metis. Positives: These are 12-week immersive experiences, that provide structure and networking opportunities. Downsides: These are rather expensive. Some factors to consider: How close is your background to what you need? That is, if you have little programming experience, it may be necessary to do something like this, but if you are transitioning from a closely related field, this may be overkill. Also, how motivated are you with independent self-learning? If you struggle with it, the accountability and structure of a bootcamp could be helpful.

There are many factors in deciding whether to do a bootcamp. A big one is how much structure or external motivation you need. There are a lot of amazing resources available online. How much discipline do you have? Note that it’s important to accept what you need to best learn. I find the motivation of online courses and having assignments really works for me, and I used to feel embarrassed that this was easier for me than having a completely independent side project. Now, I’ve accepted this and try to work with it. Other questions to ask: how much do you need to learn, and how quickly can you learn on your own? If it’s a lot, a bootcamp may really speed that up. One area where I think bootcamps can particularly shine is teaching you how to put a bunch of different tools/techniques together.

You can also move here without a job. This requires a number of things, including: ample savings, US legal residency status, and not having children, so it won’t be an option for many people. However, if you are able to do it (i.e. a US permanent resident, coming from finance), it can be a good option. Searching for a job in tech can be a full-time job, as data science and engineering interviews require a lot of studying to prepare and many require time-intensive take-home challenges. In hindsight, I’ve often done rushed job searches when I was working full-time and job-searching, and that has lead me to a few sub-optimal decisions. You will certainly find plenty of ways to fill your time with studying for interviews, coding side projects, and attending workshops and study groups. Also, two things that surprised me when I switched to tech are how frequently people switch jobs, and how normal it is to take time off between jobs for learning new things or traveling (so there is less reason to worry about gaps in your resume, as long as you have good answers about what you were learning during that time).

The huge caveat: I was unaware of how sexist, racist, ageist, transphobic, and morally bankrupt Bay Area tech is (despite its grandiose claims to be creating a better future) 5 years ago when I made my move. A few years later, I became so discouraged that I considered leaving the tech industry altogether. Tales of betrayal, callousness, and cruelty abound: for instance, someone I’m close to had his family medical emergencies exploited for profit by his coworkers, and many of my friends and loved ones have had similarly awful experiences. However, the community of passionate, fascinating people and access to cutting edge technology keeps me here, and given the choice, I’d choose to move here all over again. I currently feel very lucky with fast.ai to be working on the problems that I find most interesting and believe will have the greatest impact.

Do I need a masters or PhD to work in AI? I firmly believe the answer is NO, and I’m working to make that even more of a reality than it already is. In fact, AI PhDs are often not that well-positioned to tackle practical, relevant business problems, because that is not what their training is for. Academia is focused on advancing the theoretical bounds of the field, and is motivated by what can be published in top journals (very different from what will create a viable business!). Read more about fast.ai’s educational philosophy here and check out our free, online course Practical Deep Learning for Coders.

Should I learn Ruby after learning Python? There’s no reason for an aspiring data scientist to learn Ruby. It’s similar enough to Python that it won’t teach you new concepts (the way learning a functional language or lower-level language would), and there is not much of a data science eco-system.

Where can I find side projects that will interest employers? I think I can find random data sets online, but I guess employers do want to see how I handle a real situation? Don’t feel like your side project needs to be completely unique, or involve a unique or unusual data set. It’s fine to use a dataset you get from Kaggle for a side project. It’s fine if your project isn’t ground-breaking. When creating side projects, blog posts, or tutorials, think of your audience as the person who is one-step behind you, as they are the one who you are best positioned to help. You may be worried that a project or post wouldn’t be interesting to someone senior in the field, or that someone else may have done something similar. That’s fine! It’s good just to get your work out there.