Install the environment from environment.yml
conda env create -f environment.yml
Then active your environment.
The dataset should be like
i am about to s ##cre ##am ma ##dly in the office / especially \t when they bring more papers to pi ##le higher on my des ##k . \n
You can download the raw dataset from Wiki Dataset and put it under directory data
.
Then run dataset/create_dataset.py
to generate the dataset data, or you can use your own dataset.
The
tokenization.py
is referenced from BERT-Official
Run dataset/create_dataset.py
Run main.py
Loss | Accuracy | |
---|---|---|
Train | 7.804 | 82.319 |
Test | 7.823 | 80.426 |
If you can have better results on this dataset or any question, welcome to open an issue.