This repository is home to the code that accompanies Jon Krohn's:
- Deep Learning with TensorFlow LiveLessons (summary blog post here)
- Deep Learning for Natural Language Processing LiveLessons (summary blog post here)
Working through these LiveLessons will be easiest if you are familiar with the Unix command line basics. A tutorial of these fundamentals can be found here.
In addition, if you're unfamiliar with using Python for data analysis (e.g., the pandas, scikit-learn, matplotlib packages), the data analyst path of DataQuest will quickly get you up to speed -- steps one (Introduction to Python) and two (Intermediate Python and Pandas) provide the bulk of the essentials.
Step-by-step guides for running the code in this repository can be found in the installation directory.
All of the code that I cover in the LiveLessons can be found in this directory as Jupyter notebooks.
Below is the lesson-by-lesson sequence in which I covered them:
- via analogy to their biological inspirations, this section introduces Artificial Neural Networks and how they developed to their predominantly deep architectures today
- goes over the installation directory mentioned above, discussing the options for working through my Jupyter notebooks
- details the step-by-step installation of TensorFlow on Mac OS X, a process that may be instructive for users of any Unix-like operating system
- get your hands dirty with a simple-as-possible neural network (shallow_net_in_keras.ipynb) for classifying handwritten digits
- introduces Jupyter notebooks and their most useful hot keys
- introduces a gentle quantity of deep learning terminology by whiteboarding through:
- the MNIST digit data set
- the preprocessing of images for analysis with a neural network
- a shallow network architecture
- talk through the function and popular applications of the predominant modern families of deep neural nets:
- Dense / Fully-Connected
- Convolutional Networks (ConvNets)
- Recurrent Neural Networks (RNNs) / Long Short-Term Memory units (LSTMs)
- Reinforcement Learning
- Generative Adversarial Networks
- the following essential deep learning concepts are explained with intuitive, graphical explanations:
- neural units and activation functions
- perceptron
- sigmoid (sigmoid_function.ipynb)
- tanh
- Rectified Linear Units (ReLU)
- neural units and activation functions
- cost functions
- quadratic
- cross-entropy (cross_entropy_cost.ipynb)
- gradient descent
- backpropagation via the chain rule
- layer types
- input
- dense / fully-connected
- softmax output (softmax_demo.ipynb)
- leverage TensorFlow Playground to interactively visualize the theory from the preceding section
- overview of canonical data sets for image classification and meta-resources for data sets ideally suited to deep learning
- apply the theory learned throughout Lesson Two to create an intermediate-depth image classifier (intermediate_net_in_keras.ipynb)
- builds on, and greatly outperforms, the shallow architecture from Section 1.3
- add to our state-of-the-art deep learning toolkit by delving further into essential theory, specifically:
- weight initialization
- uniform
- normal
- Xavier Glorot
- stochastic gradient descent
- learning rate
- batch size
- second-order gradient learning
- momentum
- Adam
- unstable gradients
- vanishing
- exploding
- avoiding overfitting / model generalization
- L1/L2 regularization
- dropout
- artificial data set expansion
- batch normalization
- more layers
- max-pooling
- flatten
- weight initialization
- apply the theory learned in the previous section to create a deep, dense net for image classification (deep_net_in_keras.ipynb)
- builds on, and outperforms, the intermediate architecture from Section 2.5
- whiteboard through an intuitive explanation of what convolutional layers are and how they're so effective
- apply the theory learned in the previous section to create a deep convolutional net for image classification (lenet_in_keras.ipynb) that is inspired by the classic LeNet-5 neural network introduced in section 1.1
- classify color images of flowers with two very deep convolutional networks inspired by contemporary prize-winning model architectures: AlexNet (alexnet_in_keras.ipynb) and VGGNet (vggnet_in_keras.ipynb)
- return to the networks from the previous section, adding code to output results to the TensorBoard deep learning results-visualization tool
- explore TensorBoard and explain how to interpret model results within it
- discuss the relative strengths, weaknesses, and common applications of the leading deep learning libraries:
- Caffe
- Torch
- Theano
- TensorFlow
- and the high-level APIs TFLearn and Keras
- conclude that, for the broadest set of applications, TensorFlow is the best option
- introduce TensorFlow graphs and related terminology:
- ops
- tensors
- Variables
- placeholders
- feeds
- fetches
- build simple TensorFlow graphs (first_tensorflow_graphs.ipynb)
- build neurons in TensorFlow (first_tensorflow_neurons.ipynb)
- fit a simple line in TensorFlow:
- by considering individual data points (point_by_point_intro_to_tensorflow.ipynb)
- while taking advantage of tensors (tensor-fied_intro_to_tensorflow.ipynb)
- with batches sampled from millions of data points (intro_to_tensorflow_times_a_million.ipynb)
- create a dense neural net (intermediate_net_in_tensorflow.ipynb) in TensorFlow with an architecture identical to the intermediate one built in Keras in Section 2.5
- create a deep convolutional neural net (lenet_in_tensorflow.ipynb) in TensorFlow with an architecture identical to the LeNet-inspired one built in Keras in Section 3.4
- detail systematic steps for improving the performance of deep neural nets, including by tuning hyperparameters
- specific steps for designing and evaluating your own deep learning project
- topics worth investing time in to become an expert deployer of deep learning models
- high-level overview of deep learning as it pertains to Natural Language Processing (NLP)
- influential examples of industrial applications of NLP
- timeline of contemporary breakthroughs that have brought Deep Learning approaches to the forefront of NLP research and development
- introduce the elements of natural language
- contrast how these elements are represented by traditional machine-learning models and emergent deep-learning models
- specify common NLP applications and bucket them into three tiers of relative complexity
- build on the step-by-step installation of TensorFlow on Mac OS X covered in the Deep Learning with TensorFlow LiveLessons to facilitate the training of deep learning models with an Nvidia GPU.
- summarise the key concepts introduced in the Deep Learning with TensorFlow LiveLessons, which serve as the foundation for the material introduced in these NLP-focused LiveLessons
- take a tantalising look ahead at the capabilities developed over the course of these LiveLessons
- leverage interactive demos to enable an intuitive understanding of vector-space embeddings of words, nuanced quantitative representations of word meaning
- key papers that led to the development of word2vec, a technique for transforming natural language into vector representations
- essential word2vec theory introduced:
- architectures:
- Skip-Gram
- Continuous Bag of Words
- training algorithms:
- hierarchical softmax
- negative sampling
- evaluation perspectives:
- intrinsic
- extrinsic
- hyperparameters:
- number of dimensions
- context-word window size
- number of iterations
- size of data set
- architectures:
- contrast word2vec with its leading alternative, GloVe
- pre-trained word vectors:
- natural language data sets:
- Jon Krohn's resources page
- Zhang, Zhao and LeCun's labelled data
- Internet Movie DataBase (IMDB) reviews classified by sentiment from Andrew Maas and his Stanford colleagues (2011)
- use books from Project Gutenberg to create word vectors with word2vec
- interactively visualise the word vectors with the bokeh library (creating_word_vectors_with_word2vec.ipynb)
- in natural_language_preprocessing_best_practices.ipynb, apply the following recommended best practices to clean up a corpus natural language data prior to modeling:
- tokenize
- convert all characters to lowercase
- remove stopwords
- remove punctuation
- stem words
- handle bigram (and trigram) word collocations
- detail the calculation and functionality of the area under the Receiver Operating Characteristic curve summary metric, which is used throughout the remainder of the LiveLessons for evaluating model performance
- pair vector-space embedding with the fundamentals of deep learning introduced in the Deep Learning with TensorFlow LiveLessons to create a dense neural network for classifying documents by their sentiment (dense_sentiment_classifier.ipynb)
- add convolutional layers to the deep learning architecture to improve the performance of the natural language classifying model (convolutional_sentiment_classifier.ipynb)
- provide an intuitive understanding of Recurrent Neural Networks (RNNs), which permit backpropagation through time over sequential data, such as natural language and financial time series data
- incorporate simple RNN layers into a model that classifies documents by their sentiment (rnn_in_keras.ipynb
- develop familiarity with the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) varieties of RNNs which provide markedly more productive modeling of sequential data with deep learning models
- straightforwardly build LSTM (vanilla_lstm_in_keras.ipynb) and GRU (gru_in_keras.ipynb) deep learning architectures through the Keras high-level API
- Bi-directional LSTMs are an especially potent variant of the LSTM
- high-level theory on Bi-LSTMs before leveraging them in practice (bidirectional_lstm.ipynb)
- Bi-LSTMs are stacked to enable deep learning networks to model increasingly abstract representations of language (stacked_bidirectional_lstm.ipynb; ye_olde_conv_lstm_stackeroo.ipynb)
- advanced data modeling capabilities are possible with non-sequential architectures, e.g., parallel convolutional layers, each with unique hyperparameters (multi_convnet_architectures.ipynb)