Skip to content

commoncrawl/web-languages-code

Repository files navigation

web-languages-code

This repo holds the code, templates, and data associated with the web-languages dataset.

Theory

Installing, etc.

make install

License

The code in this repo is licensed under the Apache 2.0 license.

The templates are licensed CC0.

Data files (*.tsv) from mOSCAR and Wikipedia are copyright by them.

About

The code used to generate templates for the web-languages repo https://github.com/commoncrawl/web-languages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published