German Lemmatizer

German Lemmatizer

A Python package (using a Docker image under the hood) to lemmatize German texts.

Built upon:

IWNLP uses the crowd-generated token tables on de.wikitionary.
GermaLemma: Looks up lemmas in the TIGER Corpus and uses Pattern as a fallback for some rule-based lemmatizations.

It works as follows. First spaCy tags the token with POS. Then German Lemmatizer looks up lemmas on IWNLP and GermanLemma. If they disagree, choose the one from IWNLP. If they agree or only one tool finds it, take it. Try to preserve the casing of the original token.

You may want to use underlying Docker image: german-lemmatizer-docker

Installation

Install Docker.
pip install german-lemmatizer

Usage

Read and accept the license terms of the TIGER Corpus (free to use for non-commercial purposes).
Make sure the Docker daemons runs.
Write some Python code

from german_lemmatizer import lemmatize

lemmatize(
    ['Johannes war ein guter Schüler', 'Sabiene sang zahlreiche Lieder'],
    working_dir='*',
    chunk_size=10000,
    n_jobs=1,
    escape=False,
    remove_stop=False)

The list of texts is split into chunks (chunk_size) and processed in parallel (n_jobs).

Enable the escape parameter if your text contains newslines. remove_stop removes stopwords as defined by spaCy.

License

MIT.

Sponsoring

This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
german_lemmatizer		german_lemmatizer
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
bmbf_funded.svg		bmbf_funded.svg
matt-artz-353291-unsplash.jpg		matt-artz-353291-unsplash.jpg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

German Lemmatizer

Installation

Usage

License

Sponsoring

About

Releases

Packages

Contributors 2

Languages

License

jfilter/german-lemmatizer

Folders and files

Latest commit

History

Repository files navigation

German Lemmatizer

Installation

Usage

License

Sponsoring

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages