BERT implementation with pytorch

1. Install the environment

Install the environment from environment.yml

conda env create -f environment.yml

Then active your environment.

2.Prepare dataset

The dataset should be like

i am about to s ##cre ##am ma ##dly in the office / especially \t when they bring more papers to pi ##le higher on my des ##k . \n

You can download the raw dataset from Wiki Dataset and put it under directory data.
Then run dataset/create_dataset.py to generate the dataset data, or you can use your own dataset.

The tokenization.py is referenced from BERT-Official

3. Generate the vocab file

Run dataset/create_dataset.py

4. Pretrain your BERT

Run main.py

RESULT

	Loss	Accuracy
Train	7.804	82.319
Test	7.823	80.426

Contributing

If you can have better results on this dataset or any question, welcome to open an issue.

Reference

[BERT-pytorch]
[BERT-Official]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BERT implementation with pytorch

1. Install the environment

2.Prepare dataset

3. Generate the vocab file

4. Pretrain your BERT

RESULT

Contributing

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

BERT implementation with pytorch

1. Install the environment

2.Prepare dataset

3. Generate the vocab file

4. Pretrain your BERT

RESULT

Contributing

Reference