To install the environment, run:
Download the GLUE data using this repository or from GLUE benchmark website, unpack it to directory datas/glue and rename the folder CoLA
Download the SuperGLUE data from SuperGLUE benchmark website.
Download bert_uncased_L-12_H-768_A-12
(BERT-base) and bert_uncased_L-6_H-768_A-12
for teacher model and student model, respectively, from this repository. and use the API from Huggingface to transform them to pytorch checkpoint.
The training script for Task-specific Teacher Model Finetuning can be found in the script/teacher/
directory, where $TEACHER_PATH denotes the file path of the teacher model.
Similarly, the training script for Task-specific Student Model Distillation is located in the script/student/
directory. In this case, $STUDENT_PATH and $TEACHER_PATH represent the file paths of the student and teacher models, respectively.
To install the environment, run:
sh T0/
To perform Task-specific Teacher Model Finetuning, run:
python3 T0/ --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
To perform Task-specific Student Model Distillation, run:
python3 T0/ --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
To install the environment, run:
sh GPT-Neo/
To perform Task-specific Teacher Model Finetuning, run:
python3 GPT-Neo/ --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
To perform Task-specific Student Model Distillation, run:
python3 GPT-Neo/ --dataset_name super_glue --dataset_config_name DATASET_NAME --template_name "TEMPLATE_NAME" --model_name_or_path MODEL_DIR --output_dir ./debug --parallelize
The distilled student model for each task reported in the paper can be downloaded using the following link:
The teacher model for each task reported in the paper can be downloaded using the following link:
title={Sinkhorn Distance Minimization for Knowledge Distillation},
author={Cui, Xiao and Qin, Yulei and Gao, Yuting and Zhang, Enwei and Xu, Zihan and Wu, Tong and Li, Ke and Sun, Xing and Zhou, Wengang and Li, Houqiang},
journal={arXiv preprint arXiv:2402.17110},
TITLE = {{SinKD: Sinkhorn Distance Minimization for Knowledge Distillation}},
AUTHOR = {Cui, Xiao and Qin, Yulei and Gao, Yuting and Zhang, Enwei and Xu, Zihan and Wu, Tong and Li, Ke and Sun, Xing and Zhou, Wengang and Li, Houqiang},
URL = {},
JOURNAL = {{IEEE Transactions on Neural Networks and Learning Systems}},
YEAR = {2024},
MONTH = Nov,
KEYWORDS = {Large Language Model ; Knowledge Distillation ; Wasserstein Distance ; Sinkhorn Distance},
PDF = {},
HAL_ID = {hal-04803835},