Unofficial Pytorch Reimplementation of MiniLM and MiniLM v2. (Incompleted)
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (Neruips 2020)
- MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers (ACL 2021 Findings)
- Generate the corpus
python generate_corpus.py --cache_dir /input/dataset --corpus_dir /input/osilab-nlp/wikipedia
- Generate the datasets
python generate_data.py \
--train_corpus /input/osilab-nlp/wikipedia/corpus.txt \
--bert_model ./models/bert-base-uncased \
--output_dir ./data \
--do_lower_case --reduce_memory
- Pretrain
python -m torch.distributed.launch \
--nproc_per_node=2 \
run_pretrain.py \
--pregenerated_data ./data \
--cache_dir ./cache \
--epochs 4 \
--gradient_accumulation_steps 1 \
--train_batch_size 8 \
--learning_rate 1e-4 \
--max_seq_length 128 \
--student_model ./models/bert-base-uncased \
--masked_lm_prob 0.15 \
--do_lower_case --fp16 --scratch
- Finetune
python -m torch.distributed.launch --nproc_per_node=4 \
run_finetune.py --model ./models/bert-base-uncased \
--data_dir ./glue_data \
--task_name RTE \
--output_dir ./outputs \
--do_lower_case --fp16 \
--num_train_epochs 5 \
--learning_rate 2e-05 \
--eval_step 50 \
--max_seq_length 128 \
--train_batch_size 8
MiniLM (BERT with 4 Layers, 312 Dims)
Accuracy (%) | |
---|---|
RTE | 65.70% |
SST-2 | 86.85% |
- (22.01.01)
Unknown error occurs in finetuning code with multi-gpu setting in RTX 3090 (CUDA VER 11.4)(Solved). - (22.01.04) Complete the pretrain code on tiny size dataset (
Wikipedia datasets with 100 documentsalso done with 6M documents). - (22.01.05)
Learning Rate presents as zero if using knowledge distillation.(Solved) - (22.01.07)
Unknown error occurs in pretraining code with more than 3 GPUs. Our code works well on 2 GPUs server.(Solved) - (22.01.11)
If we do not use --reduce_memory option, the code do not make any errors on multiple GPU(with gpu numbers > 3, Solved).
- Generate wikipedia corpus and generate dataset
- Pretraining on multi-gpu setting
- Finetuning on multi-gpu setting