Skip to content

Releases: mideind/Tokenizer

Version 1.0.6

24 Aug 11:18
Compare
Choose a tag to compare

The tokenizer now automatically combines Unicode COMBINING ACUTE ACCENT and COMBINING DIAERESIS glyphs with vowels to form single code points for the Icelandic letters á, é, í, ó, ú, ý and ö (in both lower and upper case).

Version 1.0.5

23 Jul 15:58
Compare
Choose a tag to compare

Date/time and amount tokens coalesced to a further extent

Version 1.0.4

08 Jun 17:54
Compare
Choose a tag to compare

Added TOK.DATEABS, TOK.TIMESTAMPABS, TOK.MEASUREMENT

Version 1.0.0

20 Apr 12:45
Compare
Choose a tag to compare

Fix in abbreviation handling code; upgraded development status from Beta to Stable

Version 0.1.3

01 Oct 18:31
Compare
Choose a tag to compare

More thorough tests; added more exports to __init__.py; cut away unused code in tokenizer.py.