Authors: Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai
We embarked on a comprehensive investigation into personality control with typical methods to influence LLMs. Based on the exploration findings, we proposed Prompt Induction post Supervised Fine-tuning (PISF), which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness.
All codes and scripts for Continual Pre-train, Supervised Fine-tuning (SFT), and Reinforcement Learning with Human Feedback (RLHF) across all traits and personalities are available:
├── persona
│ └── data_construction # The code for constructing the dataset.
│ └── mbti_llms # The code for training.
| └── codes
| └── evaluate_datasets # Download evaluate_datasets and put here.
| └── ...
| └── train_datasets # Download train_datasets and put here.
│ └── models
│ └── performance
The data volumn of our datasets:
The summary statistics of our datasets:
All training and evaluating datasets constructed in our work are available via the following link:
To access the data, you need to request permission. We will grant data access once the preprint has been submitted.
Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to prompt induction.
While prompts prove highly effective, we found that prompt-induced personalities are less robust than those trained, making them more prone to showing conflicting personalities under reverse personality prompt induction.
Reverse Personality Prompt Induction (RPPI) task performance:
Harnessing the strengths of both SFT and prompt, we proposed Prompt Induction post Supervised Fine-tuning
Control Success Rate & Efficacy:
Control Robustness:
If you find our work useful for your research and applications, please cite using this BibTeX:
@misc{chen2024extroversion,
title={Extroversion or Introversion? Controlling The Personality of Your Large Language Models},
author={Yanquan Chen and Zhen Wu and Junjie Guo and Shujian Huang and Xinyu Dai},
year={2024},
eprint={2406.04583},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We adapted Deepspeed-Chat for the RLHF training phase.
The data and codes is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA and GPT-3.5. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.