To install the environment, run:
sh ins.sh
Download the GLUE data using this repository or from GLUE benchmark website, unpack it to directory datas/glue and rename the folder CoLA
to COLA
.
Download the SuperGLUE data from SuperGLUE benchmark website.
Download bert_uncased_L-12_H-768_A-12
(BERT-base) and bert_uncased_L-6_H-768_A-12
for teacher model and student model, respectively, from this repository. and use the API from Huggingface to transform them to pytorch checkpoint.
The training script for Task-specific Teacher Model Finetuning can be found in the script/teacher/
directory, where $TEACHER_PATH denotes the file path of the teacher model.
Similarly, the training script for Task-specific Student Model Distillation is located in the script/student/
directory. In this case, $STUDENT_PATH and $TEACHER_PATH represent the file paths of the student and teacher models, respectively.
To install the environment, run:
sh T0/ins.sh
To perform Task-specific Teacher Model Finetuning, run:
python3 T0/distillation_t.py --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
To perform Task-specific Student Model Distillation, run:
python3 T0/distillation.py --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
To install the environment, run:
sh GPT-Neo/ins.sh
To perform Task-specific Teacher Model Finetuning, run:
python3 GPT-Neo/distillation_t.py --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
To perform Task-specific Student Model Distillation, run:
python3 GPT-Neo/distillation.py --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
The distilled student model for each task reported in the paper can be downloaded using the following link: https://drive.google.com/drive/folders/1BsA0VHKSa_-Bp5I7dQ2Ftk2q7cIyPrdC
The teacher model for each task reported in the paper can be downloaded using the following link: https://drive.google.com/file/d/1sBi35Dk8VJ7TU0warB6BL9QKx-in9Ww6/view?usp=drive_link
@article{cui2024sinkhorn,
title={Sinkhorn Distance Minimization for Knowledge Distillation},
author={Cui, Xiao and Qin, Yulei and Gao, Yuting and Zhang, Enwei and Xu, Zihan and Wu, Tong and Li, Ke and Sun, Xing and Zhou, Wengang and Li, Houqiang},
journal={arXiv preprint arXiv:2402.17110},
year={2024}
}
@article{cui:hal-04803835,
TITLE = {{SinKD: Sinkhorn Distance Minimization for Knowledge Distillation}},
AUTHOR = {Cui, Xiao and Qin, Yulei and Gao, Yuting and Zhang, Enwei and Xu, Zihan and Wu, Tong and Li, Ke and Sun, Xing and Zhou, Wengang and Li, Houqiang},
URL = {https://hal.science/hal-04803835},
JOURNAL = {{IEEE Transactions on Neural Networks and Learning Systems}},
PUBLISHER = {{IEEE}},
YEAR = {2024},
MONTH = Nov,
KEYWORDS = {Large Language Model ; Knowledge Distillation ; Wasserstein Distance ; Sinkhorn Distance},
PDF = {https://hal.science/hal-04803835v1/file/TNNLS-preview.pdf},
HAL_ID = {hal-04803835},
HAL_VERSION = {v1},
}