Skip to content

Latest commit

 

History

History
38 lines (29 loc) · 1.24 KB

README.md

File metadata and controls

38 lines (29 loc) · 1.24 KB

BERT implementation with pytorch

1. Install the environment

Install the environment from environment.yml

conda env create -f environment.yml

Then active your environment.

2.Prepare dataset

The dataset should be like

i am about to s ##cre ##am ma ##dly in the office / especially \t when they bring more papers to pi ##le higher on my des ##k . \n

You can download the raw dataset from Wiki Dataset and put it under directory data.
Then run dataset/create_dataset.py to generate the dataset data, or you can use your own dataset.

The tokenization.py is referenced from BERT-Official

3. Generate the vocab file

Run dataset/create_dataset.py

4. Pretrain your BERT

Run main.py

RESULT

Loss Accuracy
Train 7.804 82.319
Test 7.823 80.426

Contributing

If you can have better results on this dataset or any question, welcome to open an issue.

Reference

[BERT-pytorch]
[BERT-Official]