Releases: mideind/Tokenizer
Releases · mideind/Tokenizer
Version 1.0.6
The tokenizer now automatically combines Unicode COMBINING ACUTE ACCENT
and COMBINING DIAERESIS
glyphs with vowels to form single code points for the Icelandic letters á, é, í, ó, ú, ý and ö (in both lower and upper case).
Version 1.0.5
Date/time and amount tokens coalesced to a further extent
Version 1.0.4
Added TOK.DATEABS
, TOK.TIMESTAMPABS
, TOK.MEASUREMENT
Version 1.0.0
Fix in abbreviation handling code; upgraded development status from Beta to Stable
Version 0.1.3
More thorough tests; added more exports to __init__.py
; cut away unused code in tokenizer.py
.