-
Notifications
You must be signed in to change notification settings - Fork 5
/
NEWS
92 lines (73 loc) · 3.02 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
0.11 26-04-2024
[Ko van der Sloot]
* allow NON_SPACING_MARKERs inside NUMBERs in all languages
[Maarten van Gompel]
* README.md: README: removed LaMachine reference
0.10 02-04-2024
[Ko van der Sloot]
* modernized configuration step
* French list of known abbreviations was not used at all!
added that
* in English, for shouldn't be handled as an abbreviation, unless
with a trailing dot.
0.9.1 21-07-2022
[Maarten van Gompel]
- New English twitter config wasn't installed properly yet
0.9 21-07-2022
[Ko van der Sloot]
- fix for PREFIX rules in french and italian
- small fix to prevent loosing a character in the PREFIX rule. (see https://github.com/LanguageMachines/ucto/issues/87 ) This doesn't fix the unwanted splits though.
- added SYMBOL, PICTOGRAM and EMOTICON to setdefinitions
- relaxed the e-mail rule a bit.
[Piroska Lendvai]
- Suggestions for German abbreviations
[Antal van den Bosch]
- New config file for English Twitter data. Recognizes and retains #hastags and @mentions.
0.8 29-11-2018
[Ko van der Sloot]
- separated .abr files from there main files for all Languages
- updated italian data (thanks to Kilian Evang)
[Iris Hendricks]
- updated abbrev files for Portuguese Turkish and French based on
https://en.wiktionary.org/wiki/Category:Portuguese_abbreviations and
https://en.wiktionary.org/wiki/Category:Turkish_initialisms.
- added full list of French abbreviations.
- added 'aub' to Dutch list
0.7.1 17-05-2018
[Ko van der Sloot]
Bug fix release:
* install some datafiles originally provided by 'ucto'
0.7 16-05-2018
[Ko vd Sloot]
- tokconfig-nld-historical: typo in rule
- updated all languages with new ABBREVIATION and NUMBER-ORDINAL rules:
= accomodate ABBREVIATIONS within brackets.
= avoid needles backtracking in NUMBER-ORDINAL
[Maarten van Gompel]
- Apparent bug in Italian config
0.6 [Ko vd Sloot] 04-04-2018
* several fixes for problems addressed in
https://github.com/LanguageMachines/ucto/issues/46
Notes:
- the suffix problems were already addressed in Release 0.5
- the colon problem is not addressed. Do we need REVERSE-SMILEY?
0.5 [Ko vd Sloot] 18-10-2017
* adding slightly adapted tokenizer configuration for historical dutch
(a bit more conservative on splitting).
INL/nederlab-linguistic-enrichment/#7
0.4 [Maarten van Gompel] 23-01-2017
* Data files are moved to share/ instead of etc/
(incompatible with ucto < 0.9.6)
* DATE pattern expanded (LanguageMachines/ucto#16)
0.3.1 [Ko van der Sloot] 06-01-2017
Bug fix release:
* fixed install problem in debian packaging using DESTDIR
* cleaned all rules from 'empty' entries (which lead to warnings)
0.3 [Ko vd Sloot] 05-01-2017
* new direcory structure an filenames based on ISO 693-3 language codes.
0.2 [Ko vd Sloot] 28-09-2016
* New implementation of rules. Needs a recent ucto that supports recursive
application of rules
* lots of bug fixes. Most notably in CURRENCY, ABBREV, SUFFIX and PREFIX rules
0.1 [Ko vd Sloot] 30-06-2016
* first version of a separate data package for ucto