Skip to content

Course project for Stanford University - CS224. Application of Projected Attention Layers in BERT

Notifications You must be signed in to change notification settings

a6kme/bert_pals

Repository files navigation

Abstract

We examine different approaches to improve the performance of the Bidirectional Encoder Representations from Transformers (BERT) on three downstream tasks: Sentiment Analysis, Paraphrase Detection, and Semantic Text Similarity (STS). Throughout our experimentation, a variety of different fine-tuning strategies and advanced techniques were leveraged including implementing Projected Attention Layers (PALs), multi-GPU training, Unsupervised Contrastive Learning of Sentence Embeddings (SimCSE), adding relational layers, hyperparameter tuning, and fine-tuning on additional datasets. We have found that a combination of PALs, unsupervised SimCSE, and additional relational layers resulted in the largest improvements in system accuracy.

Report

You can download the report here

Presentation

You can watch the presentation here

Poster

Poster

Setup and Running

  1. Setup a virtual environment conda create -n cs224n_dfp python
  2. Activate the virutal environment conda activate cs224n_dfp
  3. Install requirements pip install -r requirements.txt
  4. Unzip data.zip which contains the data sources used for fine tuning and evaluation
  5. Download the BERT Base model weights from BERT's official repository Repo Link || File Link
  6. Unzip the contents of the zip file in uncased_L-12_H-768_A-12 folder
  7. Convert the checkpoints to pytorch bin using below command
    transformers-cli convert --model_type bert \
    --tf_checkpoint uncased_L-12_H-768_A-12/bert_model.ckpt \
    --config uncased_L-12_H-768_A-12/bert_config.json \
    --pytorch_dump_output uncased_L-12_H-768_A-12/pytorch_model.bin
    
  8. Fine tune the BERT model src/multitask_classifier.py --fine-tune-mode full-model --lr 1e-5

Recommendations

  1. It is recommended to run the training on a multi gpu cluster so that the training can run faster

About

Course project for Stanford University - CS224. Application of Projected Attention Layers in BERT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages