Our online courses (all are free and have no ads):

Our software: fastai v1 for PyTorch

Take our course in person, March-April 2019 in SF: Register here

fast.ai in the news:


High Performance Numeric Programming with Swift: Explorations and Reflections

Over the past few weeks I’ve been working on building some numeric programming libraries for Swift. But wait, isn’t Swift just what iOS programmers use for building apps? Not any more! Nowadays Swift runs on Linux and Mac, and can be used for web applications, command line tools, and nearly anything else you can think of.

Using Swift for numeric programming, such as training machine learning models, is not an area that many people are working on. There’s very little information around on the topic. But after a few weeks of research and experimentation I’ve managed to create a couple of libraries that can achieve the same speed as carefully optimized vectorized C code, whilst being concise and easy to use. In this article, I’ll take you through this journey and show you what I’ve learned about how to use Swift effectively for numeric programming. I will include examples mainly from my BaseMath library, which provides generic math functions for Float and Double, and optimized versions for various collections of them. (Along the way, I’ll have plenty to say, both positive and negative, about both Swift and other languages; if you’re someone who has a deep emotional connection to your favorite programming language and doesn’t like to see any criticism of it, you might want to skip this post!)

In a future post I’ll also show how to get additional speed and functionality by interfacing with Intel’s Performance Libraries for C.

Background

Generally around the new year I try to experiment with a new language or framework. One approach that’s worked particularly well for me is to look at what the people that built some of my favorite languages, books, and libraries are doing now. This approach led me to being a very early user of Delphi, Typescript, and C# (Anders Hejlsberg, after I used his Turbo Pascal), Perl (Larry Wall, after I used rn), JQuery (John Resig, after I read Modern Javascript), and more. So when I learnt that Chris Lattner (who wrote the wonderful LLVM) is creating a new deep learning framework called Swift for Tensorflow (which I’ll shorten to S4TF from here), I decided that I should take a look.

Note that S4TF is not just a boring Swift wrapper for Tensorflow! It’s the first serious effort I’ve seen to incorporate differentiable programming deep in to the heart of a widely used language. I’m hoping that S4TF will give us a language and framework that, for the first time, treats differentiable-programming as a first-class citizen of the programming world, and will allow us to do things like:

  • Write custom GPU kernels in Swift
  • Provide compile-time checks for named tensor axis name and size matching
  • Differentiate any arbitrary code, whilst also providing vectorized and fused implementations automatically.

These things are not available in S4TF, at least as yet (in fact, it’s such early days for the project that nearly none of the deep learning functionality works yet). But I fully expect them to happen eventually, and when that happens, I’m confident that using differentiable programming in Swift will be a far better experience in Swift than in any other language.

I was lucky enough to bump in to Chris at a recent conference, and when I told him about my interest in S4TF, he was kind enough to offer to help me get started with Swift. I’ve always found that who I work with matters much more to my productivity and happiness than what I work on, so that was another excellent reason to spend time on this project. Chris has been terrifically helpful, and he’s super-nice as well—so thanks, Chris!

About Swift

Swift is a general-purpose, multi-paradigm, compiled programming language. It was started by Chris Lattner while he was at Apple, and supported many concepts from Objective-C (the main language used for programming for Apple devices). Chris described the language to me as “syntax sugar for LLVM”, since it maps so closely to many of the ideas in that compiler framework.

I’ve been coding for around 30 years, and in that time have used dozens of languages (and have even contributed to some. I always hope that when I start looking at a new language that there will be some mind-opening new ideas to find, and Swift definitely doesn’t disappoint. Swift tries to be expressive, flexible, concise, safe, easy to use, and fast. Most languages compromise significantly in at least one of these areas. Here’s my personal view of some languages that I’ve used and enjoyed, but all of which have limitations I’ve found frustrating at times:

  • Python: Slow at runtime, poor support for parallel processing (but very easy to use)
  • C, C++: hard to use (and C++ is slow at compile time), but fast and (for C++) expressive
  • Javascript: Unsafe (unless you use Typescript); somewhat slow (but easy to use and flexible)
  • Julia: Poor support for general purpose programming, but fast and expressive for numeric programming. ( Edit: this may be a bit unfair to Julia; it’s come a long way since I’ve last looked at it!)
  • Java: verbose (but getting better, particularly if you use Kotlin), less flexible (due to JVM issues), somewhat slow (but overall a language that has many useful application areas)
  • C# and F#: perhaps the fewest compromises of any major programming language, but still requires installation of a runtime, limited flexibility due to garbage collection, and difficulties making code really fast (except on Windows, where you can interface via C++/CLI)

I’d say that Swift actually does a pretty good job of avoiding any major compromises (possibly Rust does too; I haven’t used it seriously so can’t make an informed comment). It’s not the best at any of the areas I’ve mentioned, but it’s not too far off either. I don’t know of another single language that can make that claim (but note that it also has its downsides, which I’ll address in the last section of this article). I’ll look briefly at each in turn:

  • Concise: Here’s how to create a new array b that adds 2 to every element of a: let b=a.map {$0+2}. Here, {$0+2} is an anonymous function, with $0 being the automatic name for the first parameter (you can optionally add names and types if you like). The type of b is inferred automatically. As you can see, there’s a lot we’ve done with just a small amount of code!
  • Expressive: The above line of code works not just for arrays, but for any object that supports certain operations (as defined by Sequence in the Swift standard library). You can also add support for Sequence to any of your objects, and even add it to existing Swift types or types in other libraries. As soon as you do so, those objects will get this functionality for free.
  • Flexible: There’s not much that Swift can’t do. You can use it for mobile apps, desktop apps, server code, and even systems programming. It works well for parallel computing, and also can handle (somewhat) small-memory devices.
  • Safe: Swift has a fairly strong type system, which does a good job of noticing when you’ve done something that’s not going to work. It has good support for optional values, without making your code verbose. But when you need extra speed or flexibility, there’s generally ways to bypass Swift’s checks.
  • Fast: Swift avoids the things that can make a language slow; e.g. it doesn’t use garbage collection, allows you to use value types nearly everywhere, and minimizes the need for locking. It uses LLVM behind the scenes, which has excellent support for creating optimized machine code. Swift also makes it easy for the compiler to know when things are immutable, and avoids aliasing, which also helps the compiler optimize. As you’ll see, you can often get the same performance as carefully optimized C code.
  • Easy to use: This is the one area where there is, perhaps, a bit of a compromise. It’s quite easy to write basic Swift programs, but there can be obscure type issues that pop up, mysterious error messages a long way from the real site where the problem occurred, and installing and distributing applications and libraries can be challenging. Also, the language has been changing a lot (for the better!) so most information online is out of date and requires changes to make it work. Having said all that, it’s far easier to use than something like C++.

Protocol-oriented programming

The main trick that lets Swift avoid compromises is its use of Protocol-oriented programming. The basic idea is that we try to use value types as much as possible. In most languages where ease-of-use is important, reference types are widely used since they allow the use of garbage collection, virtual functions, overriding super-class behavior, and so forth. Protocol-oriented programming is Swift’s way of getting many of these benefits, whilst avoiding the overhead of reference types. In addition, by avoiding reference types, we avoid all the complex bugs introduced when we have two variables pointing at the same thing.

Value types are also a great match for functional styles of programming, since they allow for better support of immutability and related functional concerns. Many programmers, particularly in the Javascript world, have recently developed an understanding of how code can be more concise, understandable, and correct, by leveraging a functional style.

If you’ve used a language like C#, you’ll already be familiar with the idea that defining something with struct gives you a value type, and using class gives you a reference type. This is exactly how Swift handles things too.

Before we get to protocols, let’s mention a couple of other fundamentals: Automatic Reference Counting (ARC), and copy-on-write.

Automatic Reference Counting (ARC)

From the docs: “Swift uses Automatic Reference Counting (ARC) to track and manage your app’s memory usage. In most cases, this means that memory management “just works” in Swift, and you do not need to think about memory management yourself. ARC automatically frees up the memory used by class instances when those instances are no longer needed.” Reference counting has traditionally been used by dynamic languages like Perl and Python. Seeing it in a modern compiled language is quite unusual. However, Swift’s compiler works hard to track references carefully, without introducing overhead.

ARC is important both for handling Swift’s reference types (which we still need to use sometimes), and also to handle memory use in value type objects sharing memory with copy-on-write semantics, or which are embedded in a reference type. Chris also mentioned to me a number of other benefits: it provides deterministic destruction, eliminates the common problems with GC finalizers, allows scaling down to systems that don’t/can’t want a GC, and eliminates unpredictable/unreproducible pauses.

Copy-on-write

One major problem with value types in most languages is that if you have something like a big array, you wouldn’t want to pass the whole thing to a function, since that would require a lot of slow memory allocation and copying. So most languages use a pointer or reference in this situation. Swift, however, passes a reference to the original memory, but if the reference mutates the object, only then does it get copied (this is done behind the scenes automatically). So we get the best performance characteristics of value and reference types combined! This is refered to as “copy-on-wrote”, which is rather delightfully refered to in some S4TF docs as “COW 🐮” (yes, with the cow face emoji too!)

COW also helps with programming in a functional style, yet still allowing for mutation when needed—but without the overhead of unnecessary copying or verbosity of manual references.

Protocols

With value types, we can’t use inheritance hierarchies to get the benefits of object-oriented programming (although you can still use these if you use reference types, which are also supported by Swift). So instead, Swift gives us protocols. Many languages, such as Typescript, C#, and Java, have the idea of interfaces—metadata which describes what properties and methods an object can contain. At first glance, protocols look a lot like interfaces. For instance, here’s the definition from my BaseMath library of ComposedStorage, which is a protocol describing a collection that wraps some other collection. It defines two properties, data and endIndex, and one method, subscript (which is a special method in Swift, and provides indexing functionality, like an array). This protocol definition simply says that anything that conforms to this protocol must provide implementations of these three things.

public protocol ComposedStorage {
  associatedtype Storage:MutableCollection where Storage.Index==Int
  typealias Index=Int

  var data: Storage {get set}
  var endIndex: Int {get}
  subscript(i: Int)->Storage.Element {get set}
}

This is a generic protocol. Generic protocols don’t use <Type> markers like generic classes, but instead use the associatedtype keyword. So in this case, ComposedStorage is saying that the data attribute contains something of a generic type called Storage which conforms to the MutableCollection protocol, and that type in turn has an associatedtype called Index which must be of type Int in order to conform to ComposedStorage. It also says that the subscript method returns whatever type the Storage’s Element associatedtype contains. As you can see, protocols provide quite an expressive type system.

Now look further, and you’ll see something else… there are also implementations provided for this protocol!

public extension ComposedStorage {
  subscript(i: Int)->Storage.Element {
    get { return data[i]     }
    set { data[i] = newValue }
  }
  var endIndex: Int {
    return data.count
  }
}

This is where things get really interesting. By providing implementations, we’re automatically adding functionality to any object that conforms to this protocol. For instance, here is the entire definition from BaseMath of AlignedStorage, a class provides array-like functionality but internally uses aligned-memory, which is often required for fast vectorized code:

public class AlignedStorage<T:SupportsBasicMath>: BaseVector, ComposedStorage {
  public typealias Element=T
  public var data: UnsafeMutableBufferPointer<T>

  public required init(_ data: UnsafeMutableBufferPointer<T>) {self.data=data}
  public required convenience init(_ count: Int)      { self.init(UnsafeMutableBufferPointer(count)) }
  public required convenience init(_ array: Array<T>) { self.init(UnsafeMutableBufferPointer(array)) }

  deinit { UnsafeMutableRawBufferPointer(data).deallocate() }

  public var p: MutPtrT {get {return data.p}}
  public func copy()->Self { return .init(data.copy()) }
}

As you can see, there’s not much code at all. And yet this class provides all of the functionality of the protocols RandomAccessCollection, MutableCollection, ExpressibleByArrayLiteral, Equatable, and BaseVector (which together include hundreds of methods such as map, find, dropLast, and distance). This is possible because the protocols that this class conforms to, BaseVector and ComposedStorage, provide this functionality through protocol extensions (either directly, or by other protocols that they in turn conform to).

Incidentally, you may have noticed that I defined AlignedStorage as class, not struct, despite all my earlier hype about value types! It’s important to realize that there are still some situations where classes are required. Apple’s documentation provides some helpful guidance on this topic. One thing that structs don’t (yet) support is deinit; that is, the ability to run some code when an object is destroyed. In this case, we need to deallocate our memory when we’re all done with our object, so we need deinit, which means we need a class.

One common situation where you’ll find you really need to use protocols is where you want the behavior of abstract classes. Swift doesn’t support abstract classes at all, but you can get the same effect by using protocols (e.g. in the above code ComposedStorage defines data but doesn’t implement it in the protocol extension, therefore it acts like an abstract property). The same is true of multiple inheritance: it’s not supported by Swift classes, but you can conform to multiple protocols, each of which can have extensions (this is sometimes refered to as mixins in Swift). Protocol extensions share a lot of ideas with traits in Rust and typeclasses in Haskell.

Generics over Float and Double

For numeric programming, if you’re creating a library then you probably want it to transparently support at least Float and Double. However, Swift doesn’t make this easy. There is a protocol called BinaryFloatingPoint which in theory supports these types, but unfortunately only three math functions in Swift are defined for this protocol (abs, max, and min - and the standard math operators +-*/).

You could, of course, simply provide separate functionality for each type, but then you’ve got to deal with creating two versions of everything, and your users have to deal with the same problem too. Interestingly enough, I’ve found no discussions of this issue online, and Swift’s own libraries suffer from this issue in multiple places. As discussed below, Swift hasn’t been used much at all for numeric programming, and these are the kinds of issues we have to deal with. BTW, if you search for numerical programming code online, you will often see the use of the CGFloat type (which suffers from Objective-C’s naming conventions and limitations, which we’ll learn more about later), but that only provides support for one of float or double (depending on the system you’re running on). The fact that CGFloat exists at all in the Linux port of Swift is rather odd, because it was only created for Apple-specific compatibility reasons; it is almost certainly not something you’ll be wanting to use.

Resolving this problem is actually fairly straightforward, and is a good example of how to use protocols. In BaseMath I’ve created the SupportsBasicMath protocol, which is extracted below:

public protocol SupportsBasicMath:BinaryFloatingPoint {
  func log2() -> Self
  func logb() -> Self
  func nearbyint() -> Self
  func rint() -> Self
  func sin() -> Self
   
}

Then we tell Swift that Float conforms to this protocol, and we also provide implementations for the methods:

extension Float : SupportsBasicMath {
  @inlinable public func log2() -> Float {return Foundation.log2(self)}
  @inlinable public func logb() -> Float {return Foundation.logb(self)}
  @inlinable public func nearbyint() -> Float {return Foundation.nearbyint(self)}
  @inlinable public func rint() -> Float {return Foundation.rint(self)}
  @inlinable public func sin() -> Float {return Foundation.sin(self)}
   
}

Now in our library code we can simply use SupportsBasicMath as a constraint on a generic type, and we can call all the usual math functions directly. (Swift already provides support for the basic math operators in a transparent way, so we don’t have to do anything to make that work.)

If you’re thinking that it must have been a major pain to write all those wrapper functions, then don’t worry—there’s a handy trick I used that meant the computer did it for me. The trick is to use gyb templates to auto-generate the methods using python code, like so:

% for f in binfs:
  func ${f}(_ b: Self) -> Self
% end # f

If you look at the Swift code base itself, you’ll see that this trick is used liberally, for example to define the basic math functions themselves. Hopefully in some future version we’ll see generic math functions in the standard library. In the meantime, just use SupportsBasicMath from BaseMath.

Performance tricks and results

One of the really cool things about Swift is that wrappers like the above have no run-time overhead. As you see, I’ve marked them with the inlinable attribute, which tells LLVM that it’s OK to replace calls to this function with the actual function body. This kind of zero-overhead abstraction is one of the most important features of C++; it’s really amazing to see it in such a concise and expressive language as Swift.

Let’s do some experiments to see how this works, by running a simple benchmark: adding 2.0 to every element of an array of 1,000,000 floats in Swift. Assuming we’ve already allocated an array of appropriate size, we can use this code (note: benchmark is a simple function in BaseMath that times a block of code):

benchmark(title:"swift add") { for i in 0..<ar1.count {ar2[i]=ar1[i]+2.0} }
> swift add: .963 ms

Doing a million floating point additions in a millisecond is pretty impressive! But look what happens if we try one minor tweak:

benchmark(title:"swift ptr add") {
  let (p1,p2) = (ar1.p,ar2.p)
  for i in 0..<ar1.count {p2[i]=p1[i]+2.0}
}
> swift ptr add: .487 ms

It’s nearly the same code, yet twice as fast - so what happened there? BaseMath adds the p property to Array, which returns a pointer to the array’s memory; so the above code is using a pointer, instead of the array object itself. Normally, because Swift has to handle the complexities of COW, it can’t fully optimize a loop like this. But by using a pointer instead, we skip those checks, and Swift can run the code at full speed. Note that due to copy-on-write it’s possible for the array to move if you assign to it, and it can also move if you do things such as resize it; therefore, you should only grab the pointer at the time you need it.

The above code is still pretty clunky, but Swift makes it easy for us to provide an elegant and idiomatic interface. I added a new map method to Array, which puts the result into a preallocated array, instead of creating a new array. Here’s the definition of map (it uses some typealiases from BaseMath to make it a bit more concise):

@inlinable public func map<T:BaseVector>(_ f: UnaryF, _ dest: T) where Self.Element==T.Element {
  let pSrc = p; let pDest = dest.p; let n = count
  for i in 0..<n {pDest[i] = f(pSrc[i])}
}

As you can see, it’s plain Swift code. The cool thing is that this lets us now use this clear and concise code, and still get the same performance we saw before:

benchmark(title:"map add") { ar1.map({$0+2.0}, ar2) }
> map add: .518 ms

I think this is quite remarkable; we’ve been able to create a simple API which is just as fast as the pointer code, but to the class user that complexity is entirely hidden away. Of course, we don’t really know how fast this is yet, because we haven’t compared to C. So let’s do that next.

Using C

One of the really nice things about Swift is how easy it is to add C code that you write, or use external C libraries. To use our own C code, we simply create a new package with Swift Package Manager (SPM), pop a .c file in its Sources directory, and a .h file in its Sources/include directory. (Oh and BTW, in BaseMath that .h file is entirely auto-generated from the .c file using gyb!) This level of C integration is extremely rare, and the implications are huge. It means that every C library out there, including all the functionality built in to your operating system, optimized math libraries, Tensorflow’s underlying C API, and so forth can all be accessed from Swift directly. And if you for any reason need to drop in to C yourself, then you can, without any manual interfacing code or any extra build step.

Here’s our sum function in C (this is the float version—the double version is similar, and the two are generated from a single gyb template):

void smAdd_float(const float* pSrc, const float val, float* pDst, const int len) {
  for (int i=0; i<len; ++i) { pDst[i] = pSrc[i]+val; }
}

To call this, we need to pass in the count as an Int32; BaseMath adds the c property to arrays for this purpose (alternatively you could simply use numericCast(ar1.count). Here’s the result:

benchmark(title:"C add") {smAdd_float(ar1.p, 2.0, ar2.p, ar1.c)}
> C add: .488 ms

It’s basically the same speed as Swift. This is a very encouraging result, because it shows that we can get the same performance as optimized C using Swift. And not just any Swift, but idiomatic and concise Swift, which (thanks to methods like reduce and map can look much closer to math equations than most languages that are this fast.

Reductions

Let now try a different experiment: taking the sum of our array. Here’s the most idiomatic Swift code:

benchmark(title:"reduce sum") {a1 = ar1.reduce(0.0, +)}
> reduce sum: 1.308 ms

…and here’s the same thing with a loop:

benchmark(title:"loop sum") { a1 = 0; for i in 0..<size {a1+=ar1[i]} }
> loop sum: 1.331 ms

Let’s see if our earlier pointer trick helps this time too:

benchmark(title:"pointer sum") {
  let p1 = ar1.p
  a1 = 0; for i in 0..<size {a1+=p1[i]}
}
> pointer sum: 1.379 ms

Well that’s odd. It’s not any faster, which suggests that it isn’t getting the best possible performance. Let’s again switch to C and see how it performs there:

float smSum_float(const float* pSrc, const int len) {
  float r = 0;
  for (int i=0; i<len; ++i) { r += pSrc[i]; }
  return r;
}

Here’s the result:

benchmark(title:"C sum") {a1 = smSum_float(ar1.p, ar1.c)}
> C sum: .230 ms

I compared this performance to Intel’s optimized Performance Libraries version of sum and found this is even faster than their hand-optimized assembler! To get this to perform better than Swift, I did however need to know a little trick (provided by LLVM’s vectorization docs), which is to compile with the -ffast-math flag. For numeric programming like this, I recommend you always use at least these flags (this is all I’ve used for these experiments, although you can also add -march=native, and change the optimization level from O2 for Ofast):

-Xswiftc -Ounchecked -Xcc -ffast-math -Xcc -O2

Why do we need this flag? Because strictly speaking, addition is not associative, due to the quirks of floating point. But this is, in practice, very unlikely to be something that most people will care about! By default, clang will use the “strictly correct” behavior, which means it can’t vectorize the loop with SIMD. But with -ffast-math we’re telling the compiler that we don’t mind treating addition as associative (amongst other things), so it will vectorize the loop, giving us a 4x improvement in speed.

The other important thing to remember for good performance in C code like this is to ensure you have const marked on everything that won’t change, as I’ve done in the code above.

Unfortunately, there doesn’t seem to currently be a way to get Swift to vectorize any reduction. So for now at least, we have to use C to get good performance here. This is not a limitation of the language itself, it’s just an optimization that the Swift team hasn’t gotten around to implementing yet.

The good news is: BaseMath adds the sum method to Array, which uses this optimized C version, so if you use BaseMath, you get this performance automatically. So the result of test #1 is: failure. We didn’t manage to get pure Swift to reach the same performance as C. But at least we got a nice C version we can call from Swift. Let’s move on to another test and see if we can get better performance by avoiding doing any reductions.

Temporary storage

So what if we want to do a function reduction, such as sum-of-squares? Ideally, we’d like to be able to combine our map style above with sum, but without getting the performance penalty of Swift’s unoptimized reductions. To make this work, the trick is to use temporary storage. If we use our map function above to store the result in preallocated memory, we can then pass that to our C sum implementation. We want something like a static variable for storing our preallocated memory, but then we’d have to deal with locking to handle contention between threads. To avoid that, we can use thread local storage (TLS). Like most languages, Swift provides TLS functionality; however rather than make it part of the core language (like, say, C#), it provides a class, which we can access through Thread.current.threadDictionary. BaseMath adds the preallocated memory to this dictionary, and makes it available internally as tempStore; this is then the internal implementation of unary function reduction (there are also binary and ternary versions available):

@inlinable public func sum(_ f: UnaryF)->Element {
  self.map(f, tempStore)
  return tempStore.sum()
}

We can then use this as follows:

benchmark(title:"lib sum(sqr)") {a1 = ar1.sum(Float.sqr)}
> lib sum(sqr): .786 ms

This provides a nice speedup over the regular Swift reduce version:

benchmark(title:"reduce sumsqr") {a1 = ar1.reduce(0.0, {$0+Float.sqr($1)})}
> reduce sumsqr: 1.459 ms

Here’s the C version:

float smSum_sqr_float(const float* restrict pSrc, const int len) {
  float r = 0;
  #pragma clang loop interleave_count(8)
  for (int i=0; i<len; ++i) { r += sqrf(pSrc[i]); }
  return r;
}

Let’s try it out:

benchmark(title:"c sumsqr") {a1 = smSum_sqr_float(ar1.p, ar1.c)}
> c sumsqr: .229 ms

C implementations of sum for all standard unary math functions are made available by BaseMath, so you can call the above implementation by simply using:

benchmark(title:"lib sumsqr") {a1 = ar1.sumsqr()}
> c sumsqr: .219 ms

In summary: whilst the Swift version using temporary storage (and calling C for just the final sum) was twice as fast as just using reduce, using C is another 3 or more times faster.

The warts

As you can see, there’s a lot to like about numeric programming in Swift. You can get both the performance of optimized C with the convenience of automatic memory management and elegant syntax.

The most concise and flexible language I’ve used is Python. And the fastest I’ve used is C (well… actually it’s FORTRAN, but let’s not go there.) So how does it stack up against these high bars? The very idea that we could compare a single language to the flexibility of Python and the speed of C is an amazing achievement itself!

Overall, my view is that in general it takes a bit more code in Swift than Python to write the stuff I want to write, and there’s fewer ways to abstract common code out. For instance, I use decorators a lot in Python, and use them to write loads of code for me. I use *args and **kwargs a lot (the new dynamic features in Swift can provide some of that functionality, but it doesn’t go as far). I zip multiple variables together at once (in Swift you have to zip pairs of pairs for multiple variables, and then use nested parens to destructure them). And then there’s the code you have to write to get your types all lined up nicely.

I also find Swift’s performance is harder to reason about and optimize than C. C has its own quirks around performance (such as the need to use const and sometimes even requiring restrict to help the compiler) but they’re generally better documented, better understood, and more consistent. Also, C compilers such as clang and gcc provide powerful additional capabilities using pragmas such as omp and loop which can even auto-parallelize code.

Having said that, Swift closer to achieving the combination of Python’s expressiveness and C’s speed than any other language I’ve used.

There are some issues still to be aware of. One thing to consider is that protocol-oriented programming requires a very different way to doing things to what you’re probably used to. In the long run, that’s probably a good thing, since learning new programming styles can help you become a better programmer; but it could lead to some frustrations for the first few weeks.

This issue is particularly challenging because Swift’s compiler often has no idea where the source of a protocol type problem really is, and its ability to guess types is still pretty flaky. So extremely minor changes, such as changing the name of a type, or changing where a type constraint is defined, can change something that used to work, into something that spits out four screens for error messages. My advice is to try to create minimal versions of your type structures in a standalone test file, and get things working there first.

Note, however, that ease of use generally requires compromises. Python is particularly easy, because it’s perfectly happy for you to shoot yourself in the foot. Swift at least makes sure you first know how to untie your shoelaces. Chris told me: the pitch when building Swift in the first place was that the important thing to optimize for is the “end to end time to get to a correct implementation of whatever you’re trying to do”. This includes both time to pound out code, time to debug it, and time to refactor/maintain it if you’re making a change to an existing codebase. I don’t have enough experience yet, but I suspect that on this metric Swift will turn out to be a great performer.

There are some parts of Swift which I’m not a fan of: the compromises made due to Apple’s history with Objective-C, it’s packaging system, it’s community, and the lack of C++ support. Or to be more precise: it is largely parts of the Swift ecosystem that I’m not a fan of. The language itself is quite delightful. And the ecosystem can be fixed. But, for now, this is the situation that Swift programmers have to deal with, so let’s look at each of these issues in turn.

Objective-C

Objective-C is a language developed in the 1980’s designed to bring some of the object-oriented features of Smalltalk to C. It was a very successful project, and was picked by NeXT as the basis for programming NeXTSTEP in 1988. With NeXT’s acquisition by Apple, it became the primary language for coding for Apple devices. Today, it’s showing its age, and the constraints imposed by the decision to make it a strict superset of C. For instance, Objective-C doesn’t support true function overloading. Instead, it uses something called selectors, which are simply required keyword arguments. Each function’s full name is defined by the concatenation of the function name with all the selector names. This idea is also used by AppleScript, which provides something very similar to allow the name print to mean different things in different contexts:

print page 1
print document 2
print pages 1 thru 5 of document 2

AppleScript in turn inherited this idea from HyperTalk, a language created in 1987 for Apple’s much-loved (and missed) HyperCard program. Given all this history, it’s not surprising that today the idea of required named arguments is something that most folks at Apple have quite an attachment to. Perhaps more importantly, it provided a useful compromise for the designers of Objective-C, since they were able to avoid adding true function overloading to the language, keeping close compatibility with C.

Unfortunately, this constraint impacts Swift today, over 40 years after the situation that led to its introduction in Objective-C. Swift does provide true function overloading, which is particularly important in numeric programming, where you really don’t want to have to create whole separate functions for floats, doubles, and complex numbers (and quaternions, etc…). But by default all keyword names are still required, which can lead to verbose and visually cluttered code. And Apple’s style guide strongly promotes this style of coding; their Objective-C and Swift, style guides closely mirror each other, rather than allowing programmers to really leverage Swift’s unique capabilities. You can opt out of requiring named arguments by prefixing a parameter name with _ , which BaseMath uses everywhere that optional arguments are not needed.

Another area where things get rather verbose is when it comes to working with Foundation, Apple’s main class library, which is also used by Objective-C. Swift’s standard library is missing a lot of functionality that you’ll need, so you’ll often need to turn to Foundation to get stuff done. But you won’t enjoy it. After the pleasure of using such a elegantly designed language as Swift, there’s something particularly sad about using it to access as unwieldy a library as Foundation. For instance, Swift’s standard library doesn’t provide a builtin way to format floats with fixed precision, so I decided to add that functionality to my SupportsBasicMath protocol. Here’s the code:

extension SupportsBasicMath {
  public func string(_ digits:Int) -> String {
    let fmt = NumberFormatter()
    fmt.minimumFractionDigits = digits
    fmt.maximumFractionDigits = digits
    return fmt.string(from: self.nsNumber) ?? "\(self)"
  }
}

The fact that we can add this functionality to Float and Double by writing such an extension is really cool, as is the ability to handle failed conversions with Swift’s ?? operator. But look at the verbosity of the code to actually use the NumberFormatter class from Foundation! And it doesn’t even accept Float or Double, but the awkward NSNumber type from Objective-C (which is itself a clunky workaround for the lack of generics in Objective-C). So I had to add an nsNumber property to SupportsBasicMath to do the casting.

The Swift language itself does help support more concise styles, such as the {f($0)} style of closures. Concision is important for numeric programming, since it lets us write our code to reflect more closely the math that we’re implementing, and to understand the whole equation at a glance. For a masterful exposition of this (and much more), see Iverson’s Turing Award lecture Notation as a Tool for Thought.

Objective-C also doesn’t have namespaces, which means that each project picks some 2 or 3 letter prefix which it adds to all symbols. Most of the Foundation library still uses names inherited from Objective-C, so you’ll find yourself using types like CGFloat and functions like CFAbsoluteTimeGetCurrent. (Every time I type one of these symbols I’m sure a baby unicorn cries out in pain…)

The Swift team made the surprising decision to use an Objective-C implementation of Foundation and other libraries when running Swift on Apple devices, but to use native Swift libraries on Linux. As a result, you will sometimes see different behavior on each platform. For instance, the unit test framework on Apple devices is unable to find and run tests that are written as protocol extensions, but they work fine under Linux.

Overall, I feel like the constraints and history of Objective-C seem to bleed in to Swift programming too often, and each time it happens, there’s a real friction that pops up. Over time, however, these issues seem to be reducing, and I hope that in the future we’ll see Swift break out from the Objective-C shackles more and more. For instance, perhaps we’ll see a real effort to create idiomatic Swift replacements for some of the Objective-C class libraries.

Community

I’ve been using Python a lot over the last couple of years, and one thing that always bothered me is that too many people in the Python community have only ever used that one language (since it’s a great beginners’ language and is widely taught to under-graduates). As a result, there’s a lack of awareness that different languages can do things in different ways, and each choice has its own pros and cons. Instead, in the Python world, there’s a tendency for folks to think that the Python way is the one true way.

I’m seeing something similar in Swift, but in some ways it’s even worse: most Swift programmers got their start as Objective-C programmers. So a lot of the discussion you see online is from Objective-C programmers writing Swift in a style that closely parallels how things are done in Objective-C. And nearly all of them do nearly all of their programming in Xcode (which is almost certainly my least favorite IDE, except for its wonderful Swift Playgrounds feature), so a lot of advice you’ll find online shows how to solve Swift problems by getting Xcode to do things for you, rather than writing the code yourself.

Most Swift programmers are writing iOS apps, so you’ll also find a lot of guidance on how to lay out a mobile GUI, but there’s almost no information about things like how to distribute command line programs for Linux, or how to compile static libraries. In general, because the Linux support for Swift is still so new, there’s not much information available how how to use it, and many libraries and tools don’t work under Linux.

Most of the time when I was tracking down problems with my protocol conformance, or trying to figure out how to optimize some piece of code, the only information I could find would be a mailing list discussion amongst Apple’s Swift language team. These discussions tend to focus on the innards of the compiler and libraries, rather than how to use them. So there’s a big missing middle ground between app developers discussing how to use Xcode and Swift language implementation discussing how to modify the compiler. There is a good community forming now around around the Discorse forums at [https://forums.swift.org/], which hopefully over time will turn in to a useful knowledge base for Swift programmers.

Packaging and installation

Swift has an officially sanctioned package system, called Swift Package Manager (SPM). Unfortunately, it’s one of the worst packaging systems I’ve ever used. I’ve noticed that nearly every language, when creating a package manager, reinvents everything from scratch, and fails to leverage all the successes and failures of previous attempts. Swift follows this unfortunate pattern.

There are some truly excellent packaging systems out there. The best, perhaps, was and still is Perl’s CPAN, which includes an international automated testing service that tests all packages on a wide range of systems, deeply integrates documentation, has excellent tutorials, and much more. Another terrific (and more modern) system is conda, which not only handles language-specific libraries (with a focus on Python) but also handles automatically installing compatible system libraries and binaries too—and manages to do everything in your home directory so you don’t even need root access. And it works well on Linux, Mac, and Windows. It can handle distribution of both compiled modules, or source.

SPM, on the other hand, has none of the benefits of any of these systems. Even though Swift is a compiled language, it doesn’t provide a way to create or distribute compiled packages, which means users of your package will have to install all the pre-requisites for building it. And SPM doesn’t let you describe how to build your package, so (for instance) if you use BaseMath it’s up to you to remember to add the flags required for good performance when you build something that uses it.

The way dependencies is handled is really awkward. Git tags or branches are used for dependencies, and there’s no easy way to switch between a local dev build and the packaged version (like, for instance the -e flag to pip or the conda develop command). Instead, you have to modify the package file to change the location of the dependency, and remember to switch it back before you commit.

It would take far too long to document all the deficiencies of SPM; instead, you can work on the assumption that any useful feature you’ve appreciated from whatever packaging system you’re using now probably won’t be in SPM. Hopefully someone will get around to setting up a conda-based system for Swift and we can all just start using that instead…

Also, installation of Swift is a mess. On Linux, for instance, only Ubuntu is supported, and different versions require different installers. On Mac, Swift versions are tied to Xcode versions in a confusing and awkward way, and command line and Xcode versions are somewhat separate, yet somewhat linked, in ways that make my brain hurt. Again, conda seems like it could provide the best option to avoid this, since a single conda package can be used to support any flavor of Linux, and Mac can also be supported in the same way. If the work was done to get Swift on to conda, then it would be possible to say just conda install swift on any system, and everything would just work. This would also provide a solution for versioning, isolated environments, and complex dependency tracking.

(If you’re on Windows, you are, for now, out of luck. There’s an old unofficial port to Cygwin. And Swift runs fine on the Windows Subsystem for Linux. But no official native Windows Swift as yet, sadly. But there is some excellent news on this front: a hero named Saleem Abdulrasool has made great strides towards making a complete native port entirely indepdently, and in the last few days it has gotten to a point where the vast majority of the Swift test suite passes.)

C++

Whilst Apple went with Objective-C for their “C with objects” solution, the rest of the world went with C++. Eventually, the Objective-C extensions were also added to C++, to create “Objective-C++”, but there was no attempt to unify the concepts across the languages, so the resulting language is a mutant with many significant restrictions. However, there is a nice subset of the language that gets around some of the biggest limitations of C; for instance you can use function overloading, and have access to a rich standard library.

Unfortunately, Swift can’t interface with C++ at all. Even something as simple as a header file containing overloaded functions will cause Swift language interop to fail.

This is a big problem for numeric programmers, because many of the most useful numeric libraries today are written in C++. For instance, the ATen library at the heart of PyTorch is C++. There are good reasons that numeric programmers lean towards C++: it provides the features that are needed for concise and expressive solutions to numeric programming problems. For example, Julia programmers are (rightly) proud of how easy it is to support the critical broadcasting functionality in their language, which they have documented in the Julia challenge. In C++ this challenge has an elegant and fast solution. You won’t find something like this in pure C, however.

So this means that a large and increasing number of the most important building blocks for numeric programming are out of bounds for Swift programmers. This is a serious problem. (You can write plain C wrappers for a C++ class, and then create a Swift class that uses those wrappers, but that’s a very big and very boring job which I’m not sure many people are likely to embark on.)

Other languages have shown potential ways around this. For instance, C# on Windows provides “It Just Works” (IJW) interfacing with C++/CLI, a superset of C++ with support for .Net. Even more interestingly, the CPPSharp project leverages LLVM to auto-generate a C# wrapper for C++ code with no calling overhead.

Solving this problem will not be easy for Swift. But because Swift uses LLVM, and already interfaces with C (and Objective-C) it is perhaps better placed to come up with a great solution than nearly any other language. Except, perhaps, for Julia, since they’ve already done this. Twice.

Conclusion

Swift is a really interesting language which can support fast, concise, expressive numeric programming. The Swift for Tensorflow project may be the best opportunity for creating a programming language where differentiable programming is a first class citizen. Swift also lets us easily interface with C code and libraries.

However, Swift on Linux is still immature, the packaging system is weak, installation is clunky, and the libraries suffer from some rough spots due to the historical ties to Objective-C.

So how does it stack up? In the data science world, we’re mainly stuck using either R (which is the least pleasant language I’ve ever used, but with the most beautifully designed data munging and plotting libraries anywhere) or Python (which is painfully slow, very hard to parallelize, but is extremely expressive and has the best deep learning libraries available). We really need another option. Something that is fast, flexible, and provides good interop with existing libraries.

Overall, the Swift language itself looks to be exactly what we need, but much of the ecosystem needs to be replaced or at least dramatically leveled-up. There is no data science ecosystem to speak of, although the S4TF project seems likely to create some important pieces. This is a really good place to be spending time if you’re interested in being part of something that has a huge amount of potential, and has some really great people working to make that happen, and you are OK with helping smooth out the warts along the way.

One year of deep learning

My resolution for 2018 was to get into deep learning. I had stumbled upon a website called fast.ai in October 2017 after reading an article from the New York Times describing the shortage of people capable of training a deep learning model. It sounds a bit clichéd to say it changed my life, but I could hardly have imagined that, one year later, I would be helping prepare the next version of this course from behind the scenes. So in this article, I’ll tell you a little bit about my personal journey into Deep Learning, and I will share some advice which I feel could have been useful to me six months ago.

Example of adding the Eiffel Tower to a painting using a neural network
Example of adding the Eiffel Tower to a painting using a neural network

Who am I and where do I come from?

My background is mostly in Math. I have a Master’s degree from a French University; I started a PhD but stopped after six months because I found it too depressing, and went on to teach undergrads for seven years in Paris. I’m a self-taught coder, my father having had the good idea to put an introduction to ‘Basic’ in my hands when I was 13.

My husband got relocated to New York City, so I moved there three and a half years ago, and became half stay-at-home dad, half textbook writer for a publisher in France. When I first looked at fast.ai, I was curious about what the hype around Artificial Intelligence was all about, and I wanted to know if I could understand what seemed to only be accessible to a few geniuses.

I have to confess that I almost didn’t start the course; the claim it could explain Deep Learning to anyone with just one year of experience of coding and high school math sounded very suspicious to me, and I was wondering if it wasn’t all bogus (spoiler alert: it’s not). I did decide to go with it though; I had finished my latest books and finding seven hours a week to work the course while my little boys napped didn’t seem too much.

Although I started the first version of the MOOC with a clear advantage in math, I struggled a lot with what I call the geeky stuff. I’m a Windows user and I had never launched a terminal before. The setup took me the better part of a week before I was finally able to train my own dogs and cats classifier. It felt like some form of torture every time I had to run some commands in a terminal (that still bit hasn’t changed that much!)

If you’re new to the field and struggling with a part (or all) of it, remember no one had it easy. There’s always something you won’t know and that will be a challenge, but you will overcome it if you persevere. And with time, it will become easier, at least a little bit… I still need help with half my bash commands, and broke the docs and course website twice during the first lesson. Fortunately, everyone was too busy watching Jeremy to notice.

Do you need advanced math to do deep learning?

The short answer is no. The long answer is no, and anyone telling you the opposite just wants to scare you. You might need advanced math in some areas of theoretical research in Deep Learning, but there’s room for everyone at the table.

To be able to train a model in practice, you only need three things: have a sense of what a derivative is, know log and exp for the error functions, and know what a matrix product is. And you can learn all of this in a very short amount of time, with multiple resources online. In the course, Jeremy recommends Khan Academy for learning about derivatives, log, and exp, and 3 Blue 1 Brown for learning about matrix products.

In my opinion, the one mathy (it’s a bit at the intersection of math and code) thing you’ll really need to master (or at least get as comfortable with as you can) is broadcasting.

Make your own boot camp, if you’re serious about it

After I finished the first part of the course, it was clear to me that I wanted to work in this field (hence the good resolution). I contemplated various boot camps that were promising to turn me into a Data Scientist in exchange for a substantial tuition fee. I found enough testimonials online to scare me a bit from it, so fortunately I quickly gave up on that idea.

There are enough free (or cheap) resources online to teach you all you need, so as long as you have the self-discipline, you can make your own boot camp. The best of all being the courses from fast.ai of course (but I’m a bit biased since I work there now ;) ).

I thought I’d never be selected to join the International Fellowship for the second version of the second part of the course so I was a tad unprepared when the acceptance email came in. I booked a coworking space to distance myself from the craziness of a baby and a toddler at home, hired an army of sitters in a rush until we found a nanny, then worked from 9 to 5 each day, plus the evenings, on the course materials. I thought I’d follow other MOOCs, but with all the challenges Jeremy left on the forum, and the vibrant community there, I never got the time to look elsewhere.

Even though the course is intended for people to spend seven hours a week on homework, there is definitely enough to keep you busy for far longer, especially in the second part. If you’re serious about switching carreers to Deep Learning, you should spend the seven weeks working your ass off on this course. And if you can afford it money-wise/family-wise, fly yourself to San Francisco to attend in person and go every day to the study group at USF. If you can’t, find other people in your city following the course (or start your own group). In any case, be active on the forum, not just to ask questions when you have a bug, but also to help other people with their code.

Show what you can do

I’m shy and I hate networking. Those who have met me in person know I can’t chitchat for the love of God. Fortunately, there are plenty of ways you can sell yourself to potential employers behind the safety of your computer. Here are a few things that can help:

  1. Make your own project to show what you learned. Be sure to completely polish one project before moving to another one. In my case, it was reproducing superconvergence from Leslie Smith’s paper then the Deep painterly harmonization paper.
  2. Write a blog to explain what you learned. It doesn’t have to be complex new research articles, start with the basics of a model even if you think there are thousands of articles like this already. You’ll learn a lot just by trying to explain things you thought you understood.
  3. Contribute to a Deep Learning related open source project (like the fastai library).
  4. Get into Kaggle competitions (still on my to-do list, maybe it’s going to be my 2019 resolution ;) ).
  5. Get a Twitter account to tell people about all of the above.

I was amazed and surprised to get several job offers before the course even ended. Then, Jeremy mentioned he was going to rebuild the library entirely and I offered to help. One thing led to another one and he managed to get a sponsorship from AWS for me to be a Research Scientist at fast.ai.

Behind the mirror

In my opinion, there are always three stages in learning. First you understand something abstractly, then you can explain it, and eventually you manage to actually do it. This is why it’s so important to see if you can redo by yourself the code you see in the courses.

As far as Deep Learning is concerned, following the course was the first stage for me; writing blog posts, notebooks or answering questions on the forum was the second stage; re-building the library from scratch with Jeremy was the third one.

I have learned even more in those past few months than during the time when I was following the courses. Some of it in areas I had discarded a bit too quickly … and a lot of it by refactoring pieces of codes under Jeremy’s guidance until we got to the outcome you can see today. Building a fully integrated framework means you have to implement everything, so you need to master every part of the process.

All in all, this has been one of the years in which I’ve learned the most in my life. I’ll be forever thankful to Rachel and Jeremy for creating this amazing course, and I’m very proud to add my little stone to it.

The new fast.ai research datasets collection, on AWS Open Data

In machine learning and deep learning we can’t do anything without data. So the people that create datasets for us to train our models are the (often under-appreciated) heroes. Some of the most useful and important datasets are those that become important “academic baselines”; that is, datasets that are widely studied by researchers and used to compare algorithmic changes. Some of these become household names (at least, among households that train models!), such as MNIST, CIFAR 10, and Imagenet.

We all owe a debt of gratitude to those kind folks who have made datasets available for the research community. So fast.ai and the AWS Public Dataset Program have teamed up to try to give back a little: we’ve made some of the most important of these datasets available in a single place, using standard formats, on reliable and fast infrastructure. For a full list and links see the fast.ai datasets page.

fast.ai uses these datasets in the Deep Learning for Coders courses, because they provide great examples of the kind of data that students are likely to encounter, and the academic literature has many examples of model results using these datasets which students can compare their work to. If you use any of these datasets in your research, please show your gratitude by citing the original paper (we’ve provided the appropriate citation link below for each), and if you use them as part of a commercial or educational project, consider adding a note of thanks and a link to the dataset.

Dataset example: the French/English parallel corpus

One of the lessons that gets the most “wow” feedback from fast.ai students is when we study neural machine translation. It seems like magic when we can teach a model to translate from French to English, even if we can’t speak both languages ourselves!

But it’s not magic; the key is the wonderful dataset that we leverage in this lesson: the French/English parallel text corpus prepared back in 2009 by Professor Chris Callison-Burch of the University of Pennsylvania. This dataset contains over 20 million sentence pairs in French and English. He built the dataset in a really clever way: by crawling millions of Canadian web pages (which are often multi-lingual) and then using a set of simple heuristics to transform French URLs onto English URLs. The dataset is particularly important for researchers since it is used in the most important annual competition for benchmarking machine translation models.

Here’s some examples of the sentence pairs that our translation models can learn from:

Often considered the oldest science, it was born of our amazement at the sky and our need to question Astronomy is the science of space beyond Earth’s atmosphere. Souvent considérée comme la plus ancienne des sciences, elle découle de notre étonnement et de nos questionnements envers le ciel L’astronomie est la science qui étudie l’Univers au-delà de l’atmosphère terrestre.
The name is derived from the Greek root astron for star, and nomos for arrangement or law. Son nom vient du grec astron, qui veut dire étoile et nomos, qui veut dire loi.
Astronomy is concerned with celestial objects and phenomena – like stars, planets, comets and galaxies – as well as the large-scale properties of the Universe, also known as “The Big Picture”. Elle s’intéresse à des objets et des phénomènes tels que les étoiles, les planètes, les comètes, les galaxies et les propriétés de l’Univers à grande échelle.

So what’s Professor Callison-Burch doing now? When we reached out to him to check some details for his dataset, he told us he’s now preparing the University of Pennsylvania’s new AI course; and part of his preparation: watching the videos at course.fast.ai! It’s a small world indeed…

The dataset collection

The following categories are currently included in the collection:

The datasets are all stored in the same tgz format, and (where appropriate) the contents have been converted into standard formats, suitable for import into most machine learning and deep learning software. For examples of using the datasets to build practical deep learning models, keep an eye on the fast.ai blog where many tutorials will be posted soon.

fastai v1 for PyTorch: Fast and accurate neural nets using modern best practices

Note from Jeremy: Want to learn more? Listen to me discuss fastai with Sam Charrington in this in-depth interview.

Summary

Today fast.ai is releasing v1 of a new free open source library for deep learning, called fastai. The library sits on top of PyTorch v1 (released today in preview), and provides a single consistent API to the most important deep learning applications and data types. fast.ai’s recent research breakthroughs are embedded in the software, resulting in significantly improved accuracy and speed over other deep learning libraries, whilst requiring dramatically less code. You can download it today from conda, pip, or GitHub or use it on Google Cloud Platform. AWS support is coming soon.

About fast.ai

fast.ai’s mission is to make the power of state of the art deep learning available to anyone. In order to make that happen, we do three things:

  1. Research how to apply state of the art deep learning to practical problems quickly and reliably

  2. Build software to make state of the art deep learning as easy to use as possible, whilst remaining easy to customize for researchers wanting to explore hypotheses

  3. Teach courses so that as many people as possible can use the research results and software

You may well already be familiar with our courses. Hundreds of thousands of people have already taken our Practical Deep Learning for Coders course, and many alumni are now doing amazing work with their new skills, at organizations like Google Brain, OpenAI, and Github. (Many of them now actively contribute to our busy deep learning practitioner discussion forums, along with other members of the wider deep learning community.)

You may also have heard about some of our recent research breakthroughs (with help from our students and collaborators!), including breaking deep learning speed records and achieving a new state of the art in text classification.

The new fastai library

So that covers the research and teaching parts of the three listed areas—but what about software? Today we’re releasing v1.0 of our new fastai deep learning library, which has been under development for the last 18 months. fastai sits on top of PyTorch, which provides the foundation for our work. When we announced the initial development of fastai over one year ago, we described many of the advantages that PyTorch provides us. For instance, we talked about how we could “use all of the flexibility and capability of regular python code to build and train neural networks”, and “we were able to tackle a much wider range of problems”. The PyTorch team has been very supportive throughout fastai’s development, including contributing critical performance optimizations that have enabled key functionality in our software.

fastai is the first deep learning library to provide a single consistent interface to all the most commonly used deep learning applications for vision, text, tabular data, time series, and collaborative filtering. This is important for practitioners, because it means if you’ve learned to create practical computer vision models with fastai, then you can use the same approach to create natural language processing (NLP) models, or any of the other types of model we support.

Google Cloud Platform are making fastai v1 available to all their customers from today in an experimental Deep Learning image for Google Compute Engine, including ready-to-run notebooks and pre-installed datasets. To use it, simply head over to Deep Learning images page on Google Cloud Marketplace and setup configuration for your instance, set framework to PyTorch 1.0RC and click “deploy”. That’s it, you now have the VM with Jupyter Lab, PyTorch 1.0 and fastai on it! Read more about how you can use the images in this post from Google’s Viacheslav Kovalevskyi. And if you want to use fastai in a GPU-powered Jupyter Notebook, it’s now a single click away thanks to fastai support on Salamander, also released today.

Good news too from Bratin Saha, VP, Amazon Web Services: “To support fast.ai’s mission to make the power of deep learning available at scale, the fastai library will soon be available in the AWS Deep Learning AMIs and Amazon SageMaker”. And we’re very grateful for the enthusiasm from Microsoft’s AI CTO, Joseph Sirosh, who said “At Microsoft, we have an ambitious goal to make AI accessible and valuable to every organization. We are happy to see Fast.AI helping democratize deep learning at scale and leveraging the power of the cloud.”

Early users

Semantic code search at GitHub

fast.ai are enthusiastic users of Github’s collaboration tools, and many of the Github team work with fast.ai tools too - even the CEO of Github studies deep learning using our courses! Hamel Husain, a Senior Machine Learning Scientist at Github who has been studying deep learning through fast.ai for the last two years, says:

“The fast.ai course has been taken by data scientists and executives at Github alike ushering in a new age of data literacy at GitHub. It gave data scientists at GitHub the confidence to tackle state of the art problems in machine learning, which were previously believed to be only accessible to large companies or folks with PhDs.”

Husain and his colleague Ho-Hsiang Wu recently released a new experimental tool for semantic code search, which allows Github users to find useful code snippets using questions written in plain English. In a blog post announcing the tool, they describe how they switched from Google’s Tensorflow Hub to fastai, because it “gave us easy access to state of the art architectures such as AWD LSTMs, and techniques such as cyclical learning rates with random restarts”.

Screenshot from Github’s semantic code search tool
Screenshot from Github’s semantic code search tool

Husain has been using a pre-release version of the fastai library for the last 12 months. He told us:

“I choose fast.ai because of its modularity, high level apis that implemented state of the art techniques, and innovations that reduce the need for tons of compute but with the same performance characteristics. The semantic code search demo is only the tip of the iceberg, as folks in sales, marketing, fraud are currently leveraging the power of fastai to bring transformative change to their business areas.”

Music generation

One student that stood out in our last fast.ai deep learning course was Christine McLeavey Payne, who had already had a fascinating journey as an award-winning classical pianist with an SF Symphony chamber group, a high performance computing expert in the finance world, and a neuroscience and medical researcher at Stanford. Her journey has only gotten more interesting since, and today she is a Research Fellow at the famous OpenAI research lab. In her most recent OpenAI project, she used fastai to help her create Clara: A Neural Net Music Generator. Here is some of her generated chamber music. Christine says:

“The fastai library is an amazing resource. Even when I was just starting in deep learning, it was easy to get a fastai model up and running in only a few lines of code. At that point, I didn’t understand the state-of-the-art techniques happening under the hood, but still they worked, meaning my models trained faster, and reached significantly better accuracy.”

Christine has even created a human or computer quiz that you can try for yourself; see if you can figure which pieces were generated by her algorithm! Clara is closely based on work she did on language modeling for one of her fast.ai student projects, and leverages the fastai library’s support for recent advances in natural language processing. Christine told us:

“It’s only more recently that I appreciate just how important these details are, and how much work the fastai library saves me. It took me just under two weeks to get this music generation project up and getting great initial results. I’m certain that speed couldn’t have been possible without fastai.”

We think that Clara is a great example of the expressive power of deep learning—in this case, a model designed to generate and classify text has been used to generate music, with relatively few modifications. “I took a fastai Language Model almost exactly (very slight changes in sampling the generation) and experimented with ways to write out the music in either a “notewise” or “chordwise” encoding” she wrote on Twitter. The result was a crowd favorite, with Vanessa M Garcia, a Senior Researcher at IBM Watson, declaring it her top choice at OpenAI’s Demo Day.

Twitter comment about Christine's music generation demo
Twitter comment about Christine's music generation demo

fastai for art projects

Architect and Investor Miguel Pérez Michaus has been using a pre-release version of fastai to research a system for art experiments that he calls Style Reversion. This is definitely a case where a picture tells a thousand words, so rather than try to explain what it does, I’ll let you see for yourself:

Example of Style Reversion
Example of Style Reversion

Pérez Michaus says he likes designing with fastai because “I know that it can get me where [Google’s Tensorflow library] Keras can not, for example, whenever something ‘not standard’ has to be achieved”. As an early adopter, he’s seen the development of the library over the last 12 months:

“I was lucky enough to see alpha version of fastai evolving, and even back then its power and flexibility was evident. Additionally, it was fully usable for people like myself, with domain knowledge but no formal Computer Science background. And it only has gotten better. My quite humble intuition about the future of deep learning is that we will need a fine grained understanding of what is really goes on under the hood, and in that landscape I think fastai is going to shine.”

fastai for academic research

Entrepreneurs Piotr Czapla and Marcin Kardas are the co-founders of n-waves, a deep learning consulting company. They used fastai to develop a novel algorithm for text classification in Polish, based on ideas shown in fast.ai’s Cutting Edge Deep Learning for Coders course. Polish is challenging for NLP, since it is a morphologically rich language (e.g. number, gender, animacy, and case are all collapsed into a word’s suffix). The algorithm that Czapla and Kardas developed won first prize in the top NLP academic competition in Poland, and a paper based on this new research will be published soon. According to Czapla, the fastai library was critical to their success:

“I love that fastai works well for normal people that do not have hundreds of servers at their disposal. It supports quick development and prototyping, and has all the best deep learning practices incorporated into it.”

The course and community have also been important for them:

“fast.ai’s courses opened my eyes to deep learning, and helped me to think and develop intuitions around how deep learning really works. Most of the answers to my questions are already on the forum somewhere, just a search away. I love how the notes from the lectures are composed into Wiki topics, and that other students are creating transcriptions of the lessons so that they are easier to find.”

Example: Transfer learning in computer vision

fast.ai’s research is embedded in that fastai library, so you get the benefits of it automatically. Let’s take a look at an example of what that means…

Kaggle’s Dogs vs Cats competition has been a favorite part of our courses since the very start, and it represents an important class of problems: transfer learning of a pre-trained model. So we’ll take a look at how the fastai library goes on this task.

Before we built fastai, we did most of our research and teaching using Keras (with the Tensorflow backend), and we’re still big fans of it. Keras really led the way in showing how to make deep learning easier to use, and it’s been a big inspiration for us. Today, it is (for good reason) the most popular way to train neural networks. In this brief example we’ll compare Keras and fastai on what we think are the three most important metrics: amount of code required, accuracy, and speed.

Here’s all the code required to do 2-stage fine tuning with fastai - not only is there very little code to write, there’s very few parameters to set:

data = data_from_imagefolder(Path('data/dogscats'),
    ds_tfms=get_transforms(), tfms=imagenet_norm, size=224)
learn = ConvLearner(data, tvm.resnet34, metrics=accuracy)
learn.fit_one_cycle(6)
learn.unfreeze()
learn.fit_one_cycle(4, slice(1e-5,3e-4))

Let’s compare the two libraries on this task (we’ve tried to match our Keras implementation as closely as possible, although since Keras doesn’t support all the features that fastai provides, it’s not identical):

fastai resnet34* fastai resnet50 Keras
Lines of code (excluding imports) 5 5 31
Stage 1 error 0.70% 0.65% 2.05%
Stage 2 error 0.50% 0.50% 0.80%
Test time augmentation (TTA) error 0.30% 0.40% N/A*
Stage 1 time 4:56 9:30 8:30
Stage 2 time 6:44 12:48 17:38

* Keras does not provide resnet 34 or TTA

(It’s important to understand that these improved results over Keras in no way suggest that Keras isn’t an excellent piece of software. Quite the contrary! If you tried to complete this task with almost any other library, you would need to write far more code, and would be unlikely to see better speed or accuracy than Keras. That’s why we’re showing Keras in this comparison - because we’re admirers of it, and it’s the strongest benchmark we know of!)

fastai also show similarly strong performance for NLP. The state of the art text classification algorithm is ULMFit. Here’s the relative error of ULMFiT versus previous top ranked algorithms on the popular IMDb dataset, as shown in the ULMFiT paper:

Summary of text classification performance
Summary of text classification performance

fastai is currently the only library to provide this algorithm. Because the algorithm is built in to fastai, you can match the paper’s results with similar code to that shown above for Dogs vs Cats. Here’s how you train the language model for ULMFiT:

data = data_from_textcsv(LM_PATH, Tokenizer(), data_func=lm_data)
learn = RNNLearner.language_model(data, drop_mult=0.3,
    pretrained_fnames=['lstm_wt103', 'itos_wt103'])
learn.freeze()
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
learn.unfreeze()
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7), pct_start=0.25)

Under the hood - pytorch v1

A critical component of fastai is the extraordinary foundation provided by PyTorch, v1 (preview) of which is also being released today. fastai isn’t something that replaces and hides PyTorch’s API, but instead is designed to expand and enhance it. For instance, you can create new data augmentation methods by simply creating a function that does standard PyTorch tensor operations; here’s the entire definition of fastai’s jitter function:

def jitter(c, size, magnitude:uniform):
    return c.add_((torch.rand_like(c)-0.5)*magnitude*2)

As another example, fastai uses and extends PyTorch’s concise and expressive Dataset and DataLoader classes for accessing data. When we wanted to add support for image segmentation problems, it was as simple as defining this standard PyTorch Dataset class:

class MatchedFilesDataset(DatasetBase):
    def __init__(self, x:Collection[Path], y:Collection[Path]):
        assert len(x)==len(y)
        self.x,self.y = np.array(x),np.array(y)
    def __getitem__(self, i):
        return open_image(self.x[i]), open_mask(self.y[i])

This means that as practitioners want to dive deeper into their models, data, and training methods, they can take advantage of all the richness of the full PyTorch ecosystem. Thanks to PyTorch’s dynamic nature, programmers can easily debug their models using standard Python tools. In many areas of deep learning, PyTorch is the most common platform for researchers publishing their research; fastai makes it simple to test our these new approaches.

Under the hood - fastai

In the coming months we’ll be publishing academic papers and blog posts describing the key pieces of the fastai library, as well as releasing a new course that will walk students through how the library was developed from scratch. To give you a taste, we’ll touch on a couple of interesting pieces here, focussing on computer vision.

One thing we care a lot about is speed. That’s why we competed in Stanford’s DAWNBench competition for rapid and accurate model training, where (along with our collaborators) we have achieved first place in every category we entered. If you want to match our top single-machine CIFAR-10 result, it’s as simple as four lines of code:

tfms = ([pad(padding=4), crop(size=32, row_pct=(0,1), col_pct=(0,1)),
    flip_lr(p=0.5)], [])
data = data_from_imagefolder('data/cifar10', valid='test',
    ds_tfms=tfms, tfms=cifar_norm)
learn = Learner(data, wrn_22(), metrics=accuracy).to_fp16()
learn.fit_one_cycle(25, wd=0.4)

Much of the magic is buried underneath that to_fp16() method call. Behind the scenes, we’re following all of Nvidia’s recommendations for mixed precision training. No other library that we know of provides such an easy way to leverage Nvidia’s latest technology, which gives two to three times better performance compared to previous approaches.

Another thing we care a lot about is accuracy. We want your models to work well not just on your training data, but on new test data as well. Therefore, we’ve built an entirely new computer vision library from scratch that makes it easy to develop and use data augmentation methods, to improve your model’s performance on unseen data. The new library uses a new approach to minimize the number of lossy transformations that your data goes through. For instance, take a look at the three images below:

Example of fastai transforms
Example of fastai transforms

On the left is the original low resolution image from the CIFAR-10 dataset. In the middle is the result of zooming and rotating this image using standard deep learning augmentation libraries. On the right is the same zoom and rotation, using fastai v1. As you can see, with fastai the detail is kept much better; for instance, take a look at how the pilot’s window is much crisper in the right-hand image than the middle image. This change to how data augmentation is applied means that practitioners using fastai can use far more augmentation than users of other libraries, resulting in models that generalize better.

These data augmentations even work automatically with non-image data such as bounding boxes. For instance, here’s an example of how fastai’s works with an image detection dataset, automatically tracking each bounding box through all augmentations:

Transforming bounding box data
Transforming bounding box data

These kinds of thoughtful features can be found throughout the fastai library. Over the coming months we’ll be doing deep dives in to many of them, for those of you interested in the details of how fastai is implemented behind the scenes.

Thanks!

Many thanks to the PyTorch team. Without PyTorch, none of this would have been possible. Thanks also to Amazon Web Services, who sponsored fast.ai’s first Researcher in Residence, Sylvain Gugger, who has contributed much of the development of fastai v1. Thanks also to fast.ai alumni Fred Monroe, Andrew Shaw, and Stas Bekman, who have all made significant contributions, to Yaroslav Bulatov, who was a key contributor to our most recent DAWNBench project, to Viacheslav Kovalevskyi, who handled Google Cloud integration, and of course to all the students and collaborators who have helped make the community and software successful.

Introduction to Machine Learning for Coders: Launch

Today we’re launching our newest (and biggest!) course, Introduction to Machine Learning for Coders. The course, recorded at the University of San Francisco as part of the Masters of Science in Data Science curriculum, covers the most important practical foundations for modern machine learning. There are 12 lessons, each of which is around two hours long—a list of all the lessons along with a screenshot from each is at the end of this post. They are all taught by me (Jeremy Howard); I’ve been studying and using machine learning for over 25 years, from when I started my career as an Analytical Specialist at McKinsey & Company, through to my time as President and Chief Scientist of Kaggle and founding CEO of Enlitic.

There are some excellent machine learning courses already, most notably the wonderful Coursera course from Andrew Ng. But that course is showing its age now, particularly since it uses Matlab for coursework. This new course uses modern tools and libraries, including python, pandas, scikit-learn, and pytorch. Unlike many educational materials in the field, our approach is “code first” rather than “math first”. It’s well suited to people who are writing code every day, but perhaps aren’t practicing their math chops quite as often (although we do cover all the necessary theory when appropriate). Perhaps most importantly, this course is very opinionated—rather than being a complete survey of every type of model, we focus on those that really matter in practice.

Two main types of models are covered: decision tree based models (particularly “forests” of bagged decision trees), and gradient descent based models (particularly logistic regression and its variants). Decision tree models build structures that look like this (in practice you’ll generally use much larger trees):

(The example above is from Professor Terence Parr and Prince Grover’s excellent discussion of tree visualization techniques, and uses his new animl visualization library. Terence and I are currently writing a book based on the material from this course, and a preview is available of the first chapters. So if you’re more of a book learner than a video learner, be sure to follow that!)

Decision tree methods are extremely flexible and easy to use, and when ensembled (using bagging or boosting) are the state of the art on many practical tasks. However, they can struggle with extrapolating to data outside of that they’re trained on, and are not very accurate for data types such as images, audio, and natural language. These problems are often best solved with gradient descent methods, and we’ll look at some of the most important of these in the second half of the course, closing with a simple deep learning neural network. (If you’ve already taken our Practical Deep Learning for Coders course, you’ll have a small amount of conceptual overlap here, but taught in a very different way.)

You’ll learn how to create a complete decision tree forest implementation from scratch, and write your your deep learning model and train it from scratch. Along the way, you’ll learn many important skills in data preparation, model testing, and product development (including ethical issues specific to data products).

Here’s a quick overview of each lesson, along with an example screenshot (you’ll find the same details on the course site):

Lesson 1 - Introduction to Random Forests

Lesson 1 will show you how to create a “random forest™” - perhaps the most widely applicable machine learning model - to create a solution to the “Bull Book for Bulldozers” Kaggle competition, which will get you in to the top 25% on the leader-board. You’ll learn how to use a Jupyter Notebook to build and analyze models, how to download data, and other basic skills you need to get started with machine learning in practice.

Lesson 2 - Random Forest Deep Dive

Today we start by learning about metrics, loss functions, and (perhaps the most important machine learning concept) over-fitting. We discuss using validation and test sets to help us measure over-fitting.

Then we’ll learn how random forests work - first, by looking at the individual trees that make them up, then by learning about “bagging”, the simple trick that lets a random forest be much more accurate than any individual tree. Next up, we look at some helpful tricks that random forests support for making them faster, and more accurate.

Lesson 3 - Performance, Validation and Model Interpretation

Today we’ll see how to read a much larger dataset - one which may not even fit in the RAM on your machine! And we’ll also learn how to create a random forest for that dataset. We also discuss the software engineering concept of “profiling”, to learn how to speed up our code if it’s not fast enough - especially useful for these big datasets.

Next, we do a deeper dive in to validation sets, and discuss what makes a good validation set, and we use that discussion to pick a validation set for this new data.

In the second half of this lesson, we look at “model interpretation” - the critically important skill of using your model to better understand your data. Today’s focus for interpretation is the “feature importance plot”, which is perhaps the most useful model interpretation technique.

Lesson 4 - Feature Importance, Tree Interpreter

Today we do a deep dive into feature importance, including ways to make your importance plots more informative, how to use them to prune your feature space, and the use of a “dendrogram” to understand feature relationships.

In the second half of the lesson we’ll learn about two more really important interpretation techniques: partial dependence plots, and the “tree interpreter”.

Lesson 5 - Extrapolation and RF from Scratch

In today’s lesson we start by learning more about the “tree interpreter”, including the use of “waterfall charts” to analyze their output. Next up, we look into the subtle but important issue of extrapolation. This is the weak point of random forests - they can’t predict values outside the range of the input data. We study ways to identify when this problem happens, and how to deal with it.

In the second half of this lesson, we start writing our very own random forest from scratch!

Lesson 6 - Data Products

In the first half of today’s lesson we’ll learn about how to create “data products” using machine learning models, based on “The Drivetrain Method”, and in particular how model interpretation is an important part of this approach.

Next up, we’ll explore the issue of extrapolation more deeply, using a Live Coding approach - we’ll also take this opportunity to learn a couple of handy numpy tricks.

Lesson 7 - Random Forest from Scratch

Today we’ll finish off our “from scratch” random forest interpretation! We’ll also briefly look at the amazing “cython” library that you can use to get the same speed as C code with minimal changes to your python code.

Then we’ll start on the next stage of our journey - gradient descent based methods such as logistic regression and neural networks…

Lesson 8 - Gradient Descent and Logistic Regression

Today we start the second half of the course - we’re moving from decision tree based approaches like random forests, to gradient descent based approaches like deep learning.

Our first step in this journey will be to use Pytorch to help us implement logistic regression from scratch. We’ll be building a model for the classic MNIST dataset of hand-written digits.

Lesson 9 - Regularization, Learning Rates and NLP

Today we continue building our logistic regression from scratch, and we add the most important feature to it: regularization. We’ll learn about L1 vs L2 regularization, and how they can be implemented. We also talk more about how learning rates work, and how to pick one for your problem.

In the second half of the lesson, we start our discussion of natural language processing (NLP). We’ll build a “bag of words” representation of the popular IMDb text dataset, using sparse matrices to ensure good performance and reasonable memory use. We’ll build a number of models from this, including naive bayes and logistic regression, and will improve these models by adding ngram features.

Lesson 10 - More NLP, and Columnar Data

In today’s lesson we’ll further develop our NLP model by combining the strengths of naive bayes and logistic regression together, creating the hybrid “NB-SVM” model, which is a very strong baseline for text classification. To do this, we’ll create a new nn.Module class in pytorch, and look at what it’s doing behind the scenes.

In the second half of the lesson we’ll start our study of tabular and relational data using deep learning, by looking at the “Rossmann” Kaggle competition dataset. Today, we’ll start down the feature engineering path on this interesting dataset. We’ll look at continuous vs categorical variables, and what kinds of feature engineering can be done for each, with a particular focus on using embedding matrices for categorical variables.

Lesson 11 - Embeddings

Today, after a review of the math behind naive bayes, we’ll do a deep dive into embeddings - both as used for categorical variables in tabular data, and as used for words in NLP.

Lesson 12 - Complete Rossmann, Ethical Issues

In the first half of today’s class we’ll put everything we’ve learned together to create a complete model for the Rossmann dataset, including both categorical and continuous features, and careful feature engineering for all columns.

In the second half of the class we’ll study some ethical issues that arise when implementing machine learning models. We’ll see why they should matter to practitioners, and ways of thinking about them. Many students have told us they found this the most important part of the course!

Note from Jeremy: I’ll be teaching Deep Learning for Coders at the University of San Francisco starting in October; if you’ve got at least a year of coding experience, you can apply here.