Skip to content

solitude-alive/bert-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERT implementation with pytorch

1. Install the environment

Install the environment from environment.yml

conda env create -f environment.yml

Then active your environment.

2.Prepare dataset

The dataset should be like

i am about to s ##cre ##am ma ##dly in the office / especially \t when they bring more papers to pi ##le higher on my des ##k . \n

You can download the raw dataset from Wiki Dataset and put it under directory data.
Then run dataset/create_dataset.py to generate the dataset data, or you can use your own dataset.

The tokenization.py is referenced from BERT-Official

3. Generate the vocab file

Run dataset/create_dataset.py

4. Pretrain your BERT

Run main.py

RESULT

Loss Accuracy
Train 7.804 82.319
Test 7.823 80.426

Contributing

If you can have better results on this dataset or any question, welcome to open an issue.

Reference

[BERT-pytorch]
[BERT-Official]

About

BERT implementation with PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages