Notes on state of the art techniques for language modeling

Edit one day later… Much to my surprise a lot of people shared this on twitter, and much to my delight there were some very helpful and interesting comments from people I respect—so check out the thread here.

I cleverly trapped Smerity in a Twitter DM conversation while he was trapped on a train with nothing better to do than answer my dumb questions, and I managed to get a download of ~0.001% of what he knows about language modeling. It should be enough to keep me busy for a few months… The background of this conversation is that for “version 2” of our deep learning course at USF we’re curating and implementing in a consistent API the most important best practices in a range of deep learning applications, including computer vision, text, and recommendation systems. Unfortunately, for text applications the best practices are not really collected anywhere, hence the need for the Smerity-brain-dump.

I figured I’d make my notes on the conversation into a little blog post in case other people find this useful too. I’m assuming people are familiar with the topics covered in parts 1 & 2 of our MOOC, such as RNNs, dropout, attentional models, and neural translation. I’ve spent the day scouring the internet for other resources too and I’m incorporating some of my own research here, so if you see anything dumb it’ll almost certainly be my fault, not Smerity’s.

Pytorch code examples

Smerity pointed to two excellent repositories that seemed to contain examples of all the techniques we discussed:

Two other interesting libraries to be aware of are:

Techniques to get state of the art (SotA) results

In part 2 of the course we got pretty close to SotA in neural translation by showing how to use attentional models, dynamic teacher forcing, and of course stacked bidirectional LSTMs. But there have been some interesting approaches that have come to the fore since we developed that course, and Smerity suggested that combining the following should get the biggest wins across a range of NLP tasks without much additional complexity:

Problems to solve

Interesting problems to solve in the NLP world, with meaningful benchmarks, include:


To discuss this post at Hacker News, click here