We examine different approaches to improve the performance of the Bidirectional Encoder Representations from Transformers (BERT) on three downstream tasks: Sentiment Analysis, Paraphrase Detection, and Semantic Text Similarity (STS). Throughout our experimentation, a variety of different fine-tuning strategies and advanced techniques were leveraged including implementing Projected Attention Layers (PALs), multi-GPU training, Unsupervised Contrastive Learning of Sentence Embeddings (SimCSE), adding relational layers, hyperparameter tuning, and fine-tuning on additional datasets. We have found that a combination of PALs, unsupervised SimCSE, and additional relational layers resulted in the largest improvements in system accuracy.
You can download the report here
You can watch the presentation here
- Setup a virtual environment
conda create -n cs224n_dfp python
- Activate the virutal environment
conda activate cs224n_dfp
- Install requirements
pip install -r requirements.txt
- Unzip
data.zip
which contains the data sources used for fine tuning and evaluation - Download the BERT Base model weights from BERT's official repository Repo Link || File Link
- Unzip the contents of the zip file in
uncased_L-12_H-768_A-12
folder - Convert the checkpoints to pytorch bin using below command
transformers-cli convert --model_type bert \ --tf_checkpoint uncased_L-12_H-768_A-12/bert_model.ckpt \ --config uncased_L-12_H-768_A-12/bert_config.json \ --pytorch_dump_output uncased_L-12_H-768_A-12/pytorch_model.bin
- Fine tune the BERT model
src/multitask_classifier.py --fine-tune-mode full-model --lr 1e-5
- It is recommended to run the training on a multi gpu cluster so that the training can run faster