Toxic comment classification

Repo was done as a test for deep nlp using the toxic comment classification data from kaggle.

Another main motivation was to test out deep NLP models those used were:

BERT - paper , library
ULMFiT
PooledRNN

NOTE: Check output for results, contains fastai classification and pooled rnn results (both output sigmoid ouput (each class has percentage))

Install

git clone 

pip install -r requirements.txt

Download the toxic comment classification dataset from kaggle

Put in the folder data/toxic_comment

BERT

Make sure to put the fine tuned model inside the model folder within the bert folder

NOTE - for bert training check the notebook out

cd bert
python bert_test.py --text You are dumb # For single predict

or 

python bert_test.py --interactive # For console input

Pooled RNN

python train_attention.py # train

python eval.py # Eval or generate csv output

Models

Model	Download Link
BERT	Link
Pooled RNN	Link

BERT Model Training

Trained 3 times with 2 epochs each

First cycle

Second cycle

Third cycle

Enviroment

Ubuntu 18.04
Cuda 9
Nvidia GTX 1080
Cudnn 7.4

Dependencies

nltk
tensorflow-gpu=1.9
keras=2.2.4
pytorch=1.1.0
fastai
torchvision=0.3.0

ToDO

Train BERT model and test output
Train FASTAi ULMFiT and test output
Move from pytorch-bert-pretrained model package to transformers packege(latest)

Acknowledgement

BERT and Fast AI code was heavily inspired by this repo check out the implementation here
Pooled RNN Keras code was also heavily inspired by the following repo check it out here
For EDA the folowing github repo served as a backbone for the project those interested check it out here

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
attention_rnn		attention_rnn
bert		bert
checkpoints/av_rnn		checkpoints/av_rnn
data/toxic_comment		data/toxic_comment
fastai		fastai
notebooks		notebooks
output		output
.gitignore		.gitignore
README.md		README.md
clean_text.py		clean_text.py
combine_csv.py		combine_csv.py
eval.py		eval.py
filter.py		filter.py
requirements.txt		requirements.txt
train_attention.py		train_attention.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic comment classification

Install

BERT

Pooled RNN

Models

BERT Model Training

First cycle

Second cycle

Third cycle

Enviroment

Dependencies

ToDO

Acknowledgement

About

Releases

Packages

Languages

edwin-19/Toxic-Comment-Classification

Folders and files

Latest commit

History

Repository files navigation

Toxic comment classification

Install

BERT

Pooled RNN

Models

BERT Model Training

First cycle

Second cycle

Third cycle

Enviroment

Dependencies

ToDO

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages