The code of the paper-Revealing the Parallel Multilingual Learning within Large Language Models includes two parts. The first one is counting activated neurons in the multi-layer perceptrons (MLPs) of transformer models. The second one is fine-tuning and inference.
- LLaMA-Factory >= 0.3.2
To do fine-tuning and inference on the multilingual LLMs used in our work, you may need to create an environment satisfying the requirements of LLaMA-Factory.
The parallel multilingual data translated by GPTs is released in gpt_translated_data/specific dataset
. The datasets used in our experiments are detailed in the following table. Note that all the samples are randomly selected to guarantee the effectiveness of the evaluations. Except for the FLORES-200 and XNLI development set which is already parallel in multiple languages, other datasets are translated by GPTs.
Task | Evaluation Dataset | Training Set | Test Set | Translation System |
---|---|---|---|---|
Translation | WMT | FLORES-200 development set | WMT22 (de2en, zh2en, de2fr, en2de, en2zh) and WMT21 (is2en) | GPT4 |
Nature Language Inference | RTE | same data as below | RTE devlopment set | ChatGPT |
Nature Language Inference | XNLI | XNLI development set | 1000 samples of the XNLI test set for each language (fr, de, ru, zh) | ChatGPT |
Reading Comprehension | BoolQ | 1000 samples of the BoolQ training set | 1000 samples of the BoolQ development set | ChatGPT |
Text Simplification | Wiki-Auto | same data as above | 1000 samples of the Wiki-Auto development set | ChatGPT |
Abstractive Summarization | XLSum | 300 samples of the XLSum development set for each language | 500 samples of the XLSum test set for each language (fr, ru, es) | ChatGPT |
Mathematical Reasoning | GSM8K | - | GSM8K test set | GPT4 |
To determine the number of activated neurons, we need to analyze the intermediate results generated by MLPs during inference. This process involves modifying the modeling files to enable MLPs to record intermediate status (Section 3.1), running inference and writing the status into files (Section 3.2), and finally reading the result files for analysis (Sections 3.3-3.5). It's worth noting that during this process, the Qwen models we used are finetuned (Section 4), while the Bloomz models are not. Since the MLPs of Qwen are in the same architecture as those of LLaMA-2, you can duplicate the modification of modeling_qwen.py
to modeling_llama.py
and count LLaMA's activated neurons.
To obtain intermediate result files, we replace the original modeling files with those we have modified. The modified modeling files we used for Bloomz and Qwen models are provided under ./counting_activated_neurons/modeling/
. The script below is an example of a replacement:
# Replace Bloomz modeling file
mv transformers/src/transformers/models/bloom/modeling_bloom.py transformers/src/transformers/models/bloom/modeling_bloom_ori.py # Back up the original modeling file
cp ./counting_activated_neurons/modeling/modeling_bloom.py transformers/src/transformers/models/bloom/
Run the script ./counting_activated_neurons/activate/infer_activate.py
to do inference. Three output files can be obtained from this process: the model generation result file, the activated neuron proportion of each layer file, and the activated neuron status of each position file.
Description of parameters in the script:
os.environ['CUDA_VISIBLE_DEVICES']="" # specify GPUs to be used
checkpoint = "" # path of model
file_name = "" # name of input file(not include file suffix)
input_file = f"{file_name}.txt" # path of input file
activate_proportion_file = f"{file_name}-proportion-{checkpoint}.jsonl" # path of the activated neurons proportion of each layer file
activate_index_file = f"{file_name}-index-{checkpoint}.jsonl" # path of the activated neurons status of each position file
output_file = f"{file_name}-result-{checkpoint}.txt" # path of the model generation result file
sys_prompt_flag = False # Additional prompt settings, True to use sys_prompt, False to not use additional prompts
sys_prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|custom|><|im_end|>\n<|im_start|>assistant\n" # the <| custom |> will be replaced with input information
interactivate_flag = False # interactive mode settings, True to use interactive mode, False to use read file mode
Run:
python ./counting_activated_neurons/activate/infer_activate.py
Run the ./counting_activated_neurons/activate/Proportion_analysis.py
script to analyze the activated neurons proportion of each layer file and obtain the final results. The results include the proportion of inhibited neurons in each layer and the total average inhibition proportion.
Description of script parameters:
input_file = '' # the activated neurons proportion of each layer file generated during the inference process
Run:
python ./counting_activated_neurons/activate/Proportion_analysis.py
Run the ./counting_activated_neurons/HeatMap_analysis.py
script to visualize the activated neurons' distribution with a heatmap.
Description of script parameters:
file = "" # the activated neurons status of each position file generated during the inference process
Run:
python ./counting_activated_neurons/activate/HeatMap_analysis.py
Results:
Run the ./counting_activated_neurons/Peak_analysis.py
script to make line graphs of the number and the distribution of high-frequency activated neurons in each layer.
Description of script parameters:
file_list = [] # input file list, the number of lists can be 1
select_proportion = 0.2 # Select the proportion of neurons, for example, if set to 0.2, it means selecting the top 20% of neurons with activation count
Run:
python ./counting_activated_neurons/activate/Peak_analysis.py
Results:
We use LLaMA-Factory and their official code to do fine-tuning and inference, which is detailed in the table below. The setup of fine-tuning and inference is provided in the appendix of our paper.
Model | LoRA tuning or Not | Tool of Fine-tuning and Inference |
---|---|---|
ChatGPT | N | OpenAI's API |
Qwen-7B | Y | LLaMA-Factory |
Qwen-14B | Y | LLaMA-Factory |
Qwen-72B | Y | LLaMA-Factory |
ALMA-13B | Y | LLaMA-Factory |
mT0-13B | N | offical code |
Yi-34B | Y | LLaMA-Factory |
Bloomz-176B | N | offical code |
The code we used for making fine-tuning and inference data is provided in fine-tuning_and_inference
.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{mu2024large,
title={Large Language Models are Parallel Multilingual Learners},
author={Mu, Yongyu and Feng, Peinan and Cao, Zhiquan and Wu, Yuzhang and Li, Bei and Wang, Chenglong and Xiao, Tong and Song, Kai and Liu, Tongran and Zhang, Chunliang and others},
journal={arXiv preprint arXiv:2403.09073},
year={2024}
}