Lexical substitution (LS) aims at finding appropriate substitutes for a target word in a sentence. Recently, LS methods based on pretrained language models have made remarkable progress, generating potential substitutes for a target word through analysis of its contextual surroundings. However, these methods tend to overlook the preservation of the sentence's meaning when generating the substitutes. This study explores how to generate the substitute candidates from a paraphraser, as the generated paraphrases from a paraphraser contain variations in word choice and preserve the sentence's meaning. Since we cannot directly generate the substitutes via commonly used decoding strategies, we propose two simple decoding strategies that focus on the variations of the target word during decoding. Experimental results show that our methods outperform state-of-the-art LS methods based on pre-trained language models on three benchmarks.
- Our code is mainly based on Fairseq version=10.2 with customized modification of scripts, To start, you need to clone this repo and install fairseq firstly using pip install -e .
- PyTorch version = 1.7.1
- Python version >= 3.7
- Other dependencies: pip install -r requirements.txt
- For training new models, you'll also need an NVIDIA GPU and NCCL
You need to download the paraphraser(Transformer) from here and paraphraser(BART) from here, and put it into folder "checkpoints/para/transformer/" and "checkpoints/para/bart/" seperately. For candidates ranking, we use BLEURT and BARTscore(https://github.com/neulab/BARTScore).
(1) run ParaLS for lexical substitute dataset LS07
sh run_LS_Paraphraser.multi.ls07.sh # Transformer
sh run_LS_Paraphraser.multi.ls07.bart.sh # BART
(2)run ParaLS for lexical substitute dataset LS14(Default BART)
sh run_LS_Paraphraser.multi.ls14.sh # Transformer
sh run_LS_Paraphraser.multi.ls14.bart.sh # BART
Jipeng Qiang and Kang Liu contributed the code. Please cite as:
@inproceedings{qiang-etal-2023-ParaLS, title = "ParaLS: Lexical Substitution via Pretrained Paraphraser", author = "Qiang, Jipeng and Liu, Kang and Li, Yun and Yuan, Yunhao and Zhu, Yi", booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics", year = "2023" }
If you have any question about how to run the code. Please contact [email protected].