GitHub - zy4kamu/Coda: Samsung Natural Language Processing Pipeline (basically for Russian language): morphology, dependency parser and much more

DESCRIPTION

Python and C++ realization of NLP stack for Russian and English language. Includes:

Sentence splitter
Tokenizer
Morphology disambiguation
Dependency parser
Stresser
Inflector
Name entity recognizer
...

REQUIREMENTS

Ubuntu 14.04 or higher
cmake version >= 2.8
gcc version >= 4.8
python version 2.7
graphviz (to create .dot files for trees).
okular (to visualize pdf files with trees).
language-pack-ru-base (for Russian language support)
cffi version >= 1.9

INSTALLATION

sudo bash build_cpp.sh
- This script builds C++ part of application and puts libraries, execution files and configs to /opt/coda
sudo bash build_cpp.sh -d
- This step is optional. The script builds C++ part of application and puts libraries and execution files to /opt/coda/debug
bash install_python.sh
- This script installs Python wrapper .egg over C++ to corresponding site_packages folder)

USAGE EXAMPLE

Try: python Coda/python/usage_example.py
usage_example.py content:

# -*- coding: utf-8 -*-
from coda.tokenizer import Tokenizer
from coda.disambiguator import Disambiguator
from coda.syntax_parser import SyntaxParser

if __name__ == '__main__':
    tokenizer = Tokenizer("RU")
    disambiguator = Disambiguator("RU")
    syntax_parser = SyntaxParser("RU")

    sentence = u'МИД пригрозил ограничить поездки американских дипломатов по России.'
    tokens = tokenizer.tokenize(sentence)
    disambiguated = disambiguator.disambiguate(tokens)
    tree = syntax_parser.parse(disambiguated)

    print tree.to_string()
    tree.draw(dot_file="/tmp/tree1.dot", show=True)

Expected output:

0 1 МИД S@ЕД@МУЖ@ИМ@НЕОД мид
1 -1 пригрозил V@СОВ@ИЗЪЯВ@ПРОШ@ЕД@МУЖ пригрозить
2 1 ограничить V@СОВ@ИНФ ограничить
3 2 поездки S@МН@ЖЕН@ВИН@НЕОД поездка
4 5 американских A@МН@РОД американский
5 3 дипломатов S@МН@МУЖ@РОД@ОД дипломат
6 3 по PR по
7 6 России S@ЕД@ЖЕН@ДАТ@НЕОД россия

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
config		config
python		python
scripts		scripts
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build_cpp.sh		build_cpp.sh
install_python.sh		install_python.sh
sentiment_fine_tuning_huggingface.ipynb		sentiment_fine_tuning_huggingface.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DESCRIPTION

REQUIREMENTS

INSTALLATION

USAGE EXAMPLE

Publications

About

Releases

Packages

Contributors 5

Languages

License

zy4kamu/Coda

Folders and files

Latest commit

History

Repository files navigation

DESCRIPTION

REQUIREMENTS

INSTALLATION

USAGE EXAMPLE

Publications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages