Skip to content
This repository has been archived by the owner on Mar 16, 2021. It is now read-only.

Latest commit

 

History

History
60 lines (38 loc) · 1009 Bytes

README.md

File metadata and controls

60 lines (38 loc) · 1009 Bytes

banner

Status: incomplete. Don't use yet.

You probably don't need a lemmatizer, but if you do, trefwurd's got you covered.

Trefwurd is..

  • fast (20k unique tokens/s)
  • lightweight (pure Python, zero dependencies)
  • low memory footprint
  • robust
  • overridable, with custom exception lists
  • easy to train

What's a lemmatizer?

Installing

Trefwurd is compatible with Python 3.6 and up, because type annotations and f-strings are beautiful.

$ pip install trefwurd

Download pretrained lemmatization models.

$ python3 -m trefwurd download {iso-lang-code}

Simple example

import trefwurd
lemmatizer = trefwurd.load("nl")
lemmatizer.lemmatize("honden", "NOUN")
lemmatizer.lemmatize([("honden", "NOUN"), ("eten", "VERB"), ("alles", "NOUN"))
lemmatizer.lemmatize(["honden", "eten", "alles"])

Documentation

TODO: make table.

Contributing

Tests

TODO: Um... Add tests.