- A student report, covers BERT as well http://web.stanford.edu/class/cs224n/reports/custom/15742249.pdf
- Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
- Compressing Word Embeddings via Deep Compositional Code Learning
- The codebook trick can give an alternative approach to memory layers
- Word2Bits - Quantized Word Vectors
- Impressive results, but may reflect Word2Vec being not so good more than quantization being good
- Need to investigate applications to more up-to-date embeddings
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Training Quantized Nets: A Deeper Understanding
- Efficient and Effective Quantization for Sparse DNNs
- Scalable Methods for 8-bit Training of Neural Networks
- Convolutional neural network compression for natural language processing
- Improving text classification with vectors of reduced precision
- Neural Networks Compression for Language Modeling
- Natural Language Processing with Small Feed-Forward Networks
- Not quantization, but building small networks in the first place, they can be quantized separately