Preprocessing and modeling scripts for Hungarian Language Modeling
The package can be installed with either of
pip install .
python setup.py install
(though the former is preferred over the latter).
These commands install all packages required by the preprocessing scripts. In
order to use the RNN models, tensorflow
and numpy
must be installed
separately:
# For nVidia GPUs -- strongly recommended
pip install -r requirements_gpu.txt
# In every other case
pip install -r requirements.txt
The emLam corpus, a specially prepared version of the Hungarian Webcorpus, is available from http://hlt.bme.hu/en/resources/emLam.
If you use the repository or the corpus in your project, please cite the following paper (bib and paper here):
Dávid Márk Nemeskey 2017. emLam
– a Hungarian Language Modeling
baseline. In Proceedings of the 13th Conference on Hungarian Computational
Linguistics (MSZNY 2017).