Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Authors: Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai

Overview

We embarked on a comprehensive investigation into personality control with typical methods to influence LLMs. Based on the exploration findings, we proposed Prompt Induction post Supervised Fine-tuning (PISF), which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness.

Codes

All codes and scripts for Continual Pre-train, Supervised Fine-tuning (SFT), and Reinforcement Learning with Human Feedback (RLHF) across all traits and personalities are available:

├── persona
│   └── data_construction # The code for constructing the dataset.
│   └── mbti_llms  # The code for training.
|       └── codes
|       └── evaluate_datasets # Download evaluate_datasets and put here.
|       └── ...
|       └── train_datasets # Download train_datasets and put here.
│   └── models
│   └── performance

Datasets

The data volumn of our datasets:

The summary statistics of our datasets:

All training and evaluating datasets constructed in our work are available via the following link:

Download Datasets

To access the data, you need to request permission. We will grant data access once the preprint has been submitted.

Results

Synthetic Personality Control Success Rate & Efficacy

Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to prompt induction.

Synthetic Personality Control Robustness

While prompts prove highly effective, we found that prompt-induced personalities are less robust than those trained, making them more prone to showing conflicting personalities under reverse personality prompt induction.

Reverse Personality Prompt Induction (RPPI) task performance:

$\text{{PISF}}$: Prompt Induction post Supervised Fine-tuning

Harnessing the strengths of both SFT and prompt, we proposed Prompt Induction post Supervised Fine-tuning $(\text{PISF})$, which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness. Even under reverse personality prompt induction, LLMs controlled by PISF still exhibit stable and robust personalities.

Control Success Rate & Efficacy:

Control Robustness:

Citation

If you find our work useful for your research and applications, please cite using this BibTeX:

@misc{chen2024extroversion,
      title={Extroversion or Introversion? Controlling The Personality of Your Large Language Models},
      author={Yanquan Chen and Zhen Wu and Junjie Guo and Shujian Huang and Xinyu Dai},
      year={2024},
      eprint={2406.04583},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

We adapted Deepspeed-Chat for the RLHF training phase.

License

The data and codes is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA and GPT-3.5. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
asserts		asserts
persona		persona
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Overview

Codes

Datasets

Results

Synthetic Personality Control Success Rate & Efficacy

Synthetic Personality Control Robustness

$\text{{PISF}}$: Prompt Induction post Supervised Fine-tuning

Citation

Acknowledgement

License

About

Releases

Packages

Languages

License

DespairL/PISF

Folders and files

Latest commit

History

Repository files navigation

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Overview

Codes

Datasets

Results

Synthetic Personality Control Success Rate & Efficacy

Synthetic Personality Control Robustness

$\text{{PISF}}$: Prompt Induction post Supervised Fine-tuning

Citation

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages