Repo was done as a test for deep nlp using the toxic comment classification data from kaggle.
Another main motivation was to test out deep NLP models those used were:
NOTE: Check output for results, contains fastai classification and pooled rnn results (both output sigmoid ouput (each class has percentage))
git clone
pip install -r requirements.txt
Download the toxic comment classification dataset from kaggle
Put in the folder data/toxic_comment
Make sure to put the fine tuned model inside the model folder within the bert folder
NOTE - for bert training check the notebook out
cd bert
python bert_test.py --text You are dumb # For single predict
or
python bert_test.py --interactive # For console input
python train_attention.py # train
python eval.py # Eval or generate csv output
Model | Download Link |
---|---|
BERT | Link |
Pooled RNN | Link |
Trained 3 times with 2 epochs each
- Ubuntu 18.04
- Cuda 9
- Nvidia GTX 1080
- Cudnn 7.4
- nltk
- tensorflow-gpu=1.9
- keras=2.2.4
- pytorch=1.1.0
- fastai
- torchvision=0.3.0
- Train BERT model and test output
- Train FASTAi ULMFiT and test output
- Move from pytorch-bert-pretrained model package to transformers packege(latest)