Embracing Swift for Deep Learning


Jeremy Howard


March 6, 2019

Note from Jeremy: If you want to join the next deep learning course at the University of San Francisco, discussed below, please apply as soon as possible because it’s under 2 weeks away! You can apply here. At least a year of coding experience, and deep learning experience equivalent to completing Practical Deep Learning for Coders is required.

Today at the TensorFlow Dev Summit we announced that two lessons in our next course will cover Swift for TensorFlow. These lessons will be co-taught with the inventor of Swift, Chris Lattner; together, we’ll show in the class how to take the first steps towards implementing an equivalent of the fastai library in Swift for TensorFlow. We’ll be showing how to get started programming in Swift, and explain how to use and extend Swift for TensorFlow.

Last month I showed that Swift can be used for high performance numeric computing (that post also has some background on what Swift is, and why it’s a great language, so take a look at that if you haven’t read it before). In my research on this topic, I even discovered that Swift can match the performance of hand-tuned assembly code from numerical library vendors. But I warned that: “Using Swift for numeric programming, such as training machine learning models, is not an area that many people are working on. There’s very little information around on the topic”.

So, why are we embracing Swift at this time? Because Swift for TensorFlow is the first serious effort I’ve seen to incorporate differentiable programming deep in to the heart of a widely used language that is designed from the ground up for performance.

Our plans for Swift at

The combination of Python, PyTorch, and fastai is working really well for us, and for our community. We have many ongoing projects using fastai for PyTorch, including a forthcoming new book, many new software features, and the majority of the content in the upcoming courses. This stack will remain the main focus of our teaching and development.

It is very early days for Swift for TensorFlow. We definitely don’t recommend anyone tries to switch all their deep learning projects over to Swift just yet! Right now, most things don’t work. Most plans haven’t even been started. For many, this is a good reason to skip the project entirely.

But for me, it’s a reason to jump in! I love getting involved in the earliest days of projects that I’m confident will be successful, and helping our community to get involved too. Indeed, that’s what we did with PyTorch, including it in our course within a few weeks of its first pre-release version. People who are involved early in a project like this can have a big influence on its development, and soon enough they find themselves the “insiders” in something that’s getting big and popular!

I’ve been looking for a truly great numerical programming language for over 20 years now, so for me the possibility that Swift could be that language is hugely exciting. There are many project opportunities for students to pick something that’s not yet implemented in Swift for TensorFlow, and submit a PR implementing and testing that functionality.

Python: What’s missing

In the last three years, we’ve switched between many different deep learning libraries in our courses: Theano, TensorFlow, Keras, PyTorch, and of course our own fastai library. But they’ve all had one thing in common: they are Python libraries. This is because Python is today the language that’s used in nearly all research, teaching, and commercial applications of deep learning. To be a deep learning practitioner and use a language other than Python means giving up a vast ecosystem of interconnected libraries, or else using Python’s libraries through clunky inter-language communication mechanisms.

But Python is not designed to be fast, and it is not designed to be safe. Instead, it is designed to be easy, and flexible. To work around the performance problems of using “pure Python” code, we instead have to use libraries written in other languages (generally C and C++), like numpy, PyTorch, and TensorFlow, which provide Python wrappers. To work around the problem of a lack of type safety, recent versions of Python have added type annotations that optionally allow the programmer to specify the types used in a program. However, Python’s type system is not capable of expressing many types and type relationships, does not do any automated typing, and can not reliably check all types at compile time. Therefore, using types in Python requires a lot of extra code, but falls far short of the level of type safety that other languages can provide.

The C/C++ libraries that are at the heart of nearly all Python numeric programming are also a problem for both researchers, and for educators. Researchers can not easily modify the underlying code, or inspect it, since it requires a whole different toolbox—and in the case of libraries like MKL and cudnn the underlying code is optimized machine language. Educators cannot easily show students what’s really going on in a piece of code, because the normal Python-based debugging and inspection approaches can not handle libraries in other languages. Developers struggle to profile and optimize code where it crosses language boundaries, and Python itself can not properly optimize code that crosses language or library boundaries.

For instance, we’ve been doing lots of research in to different types of recurrent neural network architectures and normalization layers. In both cases, we haven’t been able to get the same level of performance that we see in pure CUDA C implementations, even when using PyTorch’s fantastic new JIT compiler.

At the PyTorch Dev Summit last year I participated in a panel with Soumith Chintala, Yangqing Jia, Noah Goodman, and Chris Lattner. In the panel discussion, I said that: “I love everything about PyTorch, except Python.” I even asked Soumith “Do you think we might see a ‘SwifTorch’ one day?” At the time, I didn’t know that we might be working with Swift ourselves so soon!

So what now?

In the end, anything written in Python has to deal with one or more of the following:

  • Being run as pure Python code, which means it’s slow
  • Being a wrapper around some C library, which means it’s hard to extend, can’t be optimized across library boundaries, and hard to profile and debug
  • Being converted in to some different language (such as PyTorch using TorchScript, or TensorFlow using XLA), which means you’re not actually writing in the final target language, and have to deal with the mismatch between the language you think you’re writing, and the actual language that’s really being used (with at least the same debugging and profiling challenges of using a C library).

On the other hand, Swift is very closely linked with its underlying compiler infrastructure, LLVM. In fact, Chris Lattner has described it before as “syntactic sugar for LLVM”. This means that code written in Swift can take full advantage of all of the performance optimization infrastructure provided by LLVM. Furthermore, Chris Lattner and Jacques Pienaar recently launched the MLIR compiler infrastructure project, which has the potential to significantly improve the capabilities of Swift for TensorFlow.

Our hope is that we’ll be able to use Swift to write every layer of the deep learning stack, from the highest level network abstractions all the way down to the lowest level RNN cell implementation. There would be many benefits to doing this:

  • For education, nothing is mysterious any more; you can see exactly what’s going on in every bit of code you use
  • For research, nothing is out of bounds; whatever you can conceive of, you can implement, and have it run at full speed
  • For development, the language helps you; your editor will deeply understand your code, doing intelligent completions and warning you about problems like tensor mismatches, your profiler will show you all the steps going on so you can find and fix performance problems, and your debugger will let you step all the way to the bottom of your call stack
  • For deployment, you can deploy the same exact code that you developed on using your laptop. No need to convert it to some arcane format that only your deep learning server understands!

In conclusion

For education, our focus has always been on explaining the concepts of deep learning, and the practicalities of actually using this tool. We’ve found that our students can very easily (within a couple of days) switch to being productive in a different library, as long as they understand the foundations well, and have practiced applying them to solve real problems.

Our Python fastai library will remain the focus of our development and teaching. We will, however, be doing lots of research using Swift for TensorFlow, and if it reaches the potential we think it has, expect to see it appearing more and more in future courses! We will be working to make practical, world-class, deep learning in Swift as accessible as possible—and that probably means bringing our fastai library (or something even better!) to Swift too. It’s too early to say exactly what that will look like; if you want to be part of making this happen, be sure to join the upcoming class, either in person at the University of San Francisco, or in the next part 2 MOOC (coming out June 2019).