-
Notifications
You must be signed in to change notification settings - Fork 7
/
NEWS
499 lines (435 loc) · 16.6 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
2.20 2024-09-12
[Ko van der Sloot]
* require C++17 now
* small refactorings
* cleanup in GitHub CI file
2.19 2024-05-27
[Ko van der Sloot]
* again bumped the .so version, as we break the ABI
* Refactored the Class hierarchy for clearer code
- introducing an AbstracFeature class as base for all derived Features
* improved exception and error handling, including line numbers in messages
when possible
* added code to detect mismatch between annotators and processors (were not
detected until now)
* cleaner en better C++ code, const correctness and such...
2.18 2024-04-26
[Ko van der Sloot]
* buming the .so version due to ABI breaks
* fix for --canonical option in folialint
* fix for https://github.com/LanguageMachines/libfolia/issues/56
* improved checking for emtpy <t> nodes
* several code improvements. const correctness etc.
* fix for https://github.com/LanguageMachines/libfolia/issues/55
* better check for illegal Correction's:
https://github.com/proycon/folia/issues/77
* added Doxygen config
* better handling of XML comment nodes
2.17 2023-10-21
[Ko van der Sloot]
* assume ticcutils >= 0.34 to force NFC normalization
* refactored str() and unicode() text extraction functions.
* a lot of work on code quality
2.16 2023-09-21
[Ko van der Sloot]
* fix for https://github.com/LanguageMachines/libfolia/issues/54
* Clearer error messages (adding filename, if present)
* some code cleaned/clarified
* added code to parse, store and output XML's Processing Instruction nodes
* added Etymology annotation
2.15 2023-05-08
[Ko van der Sloot]
* fixed a terrible typo/bug in subclasses.cxx:
using el-referable(), where el->referable() was meant
* plugged a small potential memory-leak
* fixed some offset problems in text handling
* fixed https://github.com/LanguageMachines/libfolia/issues/52
* foliadiff script now returns better message on failure
* switching to C++14
* code polishing
* updated GitHub action
2.14 2023-02-20
[Ko van der Sloot]
* implemented an ADD_FORMATTING TextPolicy to extract otherwise hidden
text from <t-hspace> and <t-hyph> Markup elements
(see https://github.com/proycon/foliapy/issues/25)
* fix for https://github.com/LanguageMachines/libfolia/issues/51
* General improvements:
- include a filename when throwing during Document processing
- added setutest() member for XmlText class
- general code quality
2.13 2023-01-23
[Ko van der Sloot]
* removed dependency on libtar
* quick fix for ignoring text inside <t-hbr> https://github.com/proycon/foliapy/issues/25
[Maarten van Gompel]
* updated minimum required libxml2 version
2.12 2023-01-02
[Ko van der Sloot]
* fix for https://github.com/LanguageMachines/libfolia/issues/49
* ABI breached, so bunped the .so file version
* cleaner C++ code, more C++11 now, removing CppCheck warnings
* using more recent TiccUtils (for enum_flags.h)
* several small improvements
* improved GitHub action
2.11 2022-07-22
[Ko vd Sloot]
* Significant refactoring, code cleaning, code reduction, and extra comments
* fixed memory leaks in the test (and also tests destroy() function now)
* Added some safeguards against multiple setnames for text_annotation. This is a limitation discussed in https://github.com/proycon/folia/issues/104
* added code to handle text extraction for "empty" rows.
* implemented a fix for empty cell's. https://github.com/proycon/foliatools/issues/41
* added a fix for text offsets in embedded elements in a structure that may NOT carry text itself. Like cell inside a table.
[Maarten van Gompel]
* codemeta.json: updated metadata according to (proposed) CLARIAH requirements
2.10 2021-12-15
[Ko vd Sloot]
* several code improvements, suggested by CPPcheck and scan-build
* start using TextPolicy::debug
[Maarten van Gompel]
* impemented implicitspace logic for whitespace issue proycon/folia#101
2.9 2021-07-12
[Ko vd Sloot]
* Reworked the FoliaElement class hierarchy. Much clearer now
* re-arranged file structure. Separating some files into smaller files
* text extraction:
- numereous changes and additions to handle spaces better.
- refactored the code, using a new TextPolicy class for clarity
- added code for handling 'tag' attributes using callbacks
* improved handling of Correction
* numerous code refactorings for clearity and speed
* adapted and improved documentation
2.8.1 2021-04-07
[Ko vd Sloot]
* re-added the ltrim() function for backward compatibility
2.8 2021-04-07
* implements FoLia v2.5, with a new 'model' for whitespaces in texts.
* bumped the .so version to 17
[Maarten van Gompel]
* added <t-lang> TextMarkup tag
* added <t-hspace> TextMarkup tag
* added tag attribute
* fix for proycon/folia#88, proycon/folia#92, proycon/folia#93, proycon/folia#94
* added text normalization functions to support the new text model,
maintaining backward compatibility.
[Ko vd Sloot]
* parse and preserve the xml:space attribute.
* added a 'space' normalizer. ALL exotic spaces (like em-space and en-space)
are replaced by the standard ascii space
* fixed https://github.com/LanguageMachines/libfolia/issues/48
* code cleanup/refactoring
* ditch TravisCI and implemented a GitHub action
2.7 2021-01-07
* implemented a more relaxed MetaData scheme, allowing mixing 'foreign'
and 'native' MetaData
* bumped the .so version to 15
* features may be present in <t> and <t-*> nodes now
2.6.1 2020-12-11
[Maarten van Gompel]
* Updated for FoLiA v2.4.1: strip leading/trailing whitespace in text content (proycon/folia#88)
[Ko vd Sloot]
* Fixed problem with text-conststency errors for <t-str> within <t>
2.6.0 2020-11-16
[Maarten van Gompel]
* Updated for FoLiA v2.4
* Revised external implementation
* Implemented Modality annotation
[Ko vd Sloot]
* cleanup and extra sanity tests
* Implemented an 'explicit' mode for Document (FoLiA v2.3) and in folialint
2.5.1 2020-09-15
[Maarten van Gompel]
* Bugfix: Fixed handling of control characters, strip control characters by default
[Ko vd Sloot]
* fix in date handling (lookup table for month -> integer conversion )
* minor refactoring
* some documentation
2.5 2020-09-02
[Maarten van Gompel]
* Adapted to FoLiA v2.3
* Support parsing of the new explicit form
[Ko vd Sloot]
* folialint: updated usage() and man page
* minor refactoring
2.4 2020-04-15
[Ko van der Sloot]
* comment in Doxygen format added
* bumped the library version to 14
* fix for https://github.com/proycon/folia/issues/82
* fix for https://github.com/proycon/folia/issues/42
* fixed problem with using new tag names on pre 1.6 documents
* better checks in folia_engine on text inconsistencies and such
(https://github.com/LanguageMachines/libfolia/issues/43)
* confidence output is more consistent now
* removed the folia_builder (was not used)
* code refactorings and cleanup, removing unused functions
2.3.2 2020-01-13
[Ko van der Sloot]
Bug fix release
* fix for https://github.com/LanguageMachines/foliautils/issues/37
* fix for https://github.com/LanguageMachines/foliautils/issues/38
* fixes in Correction handling.
* fixed a Multi-Threading problem with the static reverse_old map
2.3.1 2019-10-21
[Maarten van Gompel]
* Bug fix release for gcc 9.1
It stumbles upon some inline functions
[Ko van der Sloot]
* replaced call to unsafe 'tmpnam()' by 'TiCC::tempname()'
2.3 2019-08-29
[Ko vd Sloot]
new features:
* autodeclare mode introduced (as in FoLiApy)
* folialint by default doesn't autodeclare. use -a or --autodeclare to use it
* Better detecting of declaration errors in general
* the select function now also enables the possibility to search recursively
upto the first matching sibling.
* bumped library version to 13
other changes:
* some exceptions are changed.
* less exceptions are thrown. An empty result is returned instead.
* folialint now accept bote -d and --debug
* real fix for issue 70
* small bug fixes and refactorings
* accept empty Comment and Description nodes
2.2.1 2019-07-22
[Ko vd Sloot]
Bug fix release:
* There were some problems handling NO setname vs. EMPTY setname, during
incremental parsing in folia::Engine. This was sorted out now:
https://github.com/proycon/folia/issues/74
This related to some ucto and frog issues too:
https://github.com/LanguageMachines/ucto/issues/70
https://github.com/LanguageMachines/frog/issues/72
2.2 2019-07-15
[Ko vd Sloot]
Bug fix release.
* Folia::Engine choked on some complex FoLiA. Solved by refactoring and in fact
simplifying some code. (Frog issue #77 revealed this)
* added flush() on document output to streams. (frog issue #72)
* improved output in debugging mode
2.1 2019-06-19
[Ko vd Sloot]
Bug fixes and enhancements:
* provenance:
- added 'generate_id' attribute with 'auto()' and 'next()' values
- some code improvements
* bugs:
- When using the FoLiA-engine, we have to save the ORIGINAL annotationdefaults,
and use these when parsing.
2.0 2019-05-22
[Ko vd Sloot]
Major release.
* Supports the new FoLiA 2.0 features:
- provenance support
- more stricter checking on annotation declarations
- added the new TextMarkupReference class
- supports Hidden Words.
- All structure elements can have the 'space' attribute
- support for groupannotations
- many more.
* API and ABI breaches:
- libray version bumped to version 10
- many functions are renamed
- the text() functions have an ENUM parameter now to select for STRICT,
RETAIN or HIDDEN
* bug fixes
- support for xlink: improved
- there as a rare mixup between <text> nodes and <text-annotation> nodes in
the folia::Engine
- all <wref> nodes get a 't' attribute now on serializing.
- reading Extrenal Folia could getint an endless loop
* code refactoring and cleanup
1.16 2019-05-15
[Ko van der Sloot]
stabilizing release for folia1.5. Next release wil support the new FoLiA 2.0
Changes:
* renamed folia::Processor to folia::Engine
* extended and improved Engine code a lot
* avoid spurious newline on Document output
* Will read and ignore some FoLiA 2.0 additions
* numerous small additions and fixes
* make sure that the XmlParser uses the HUGE model everywhere
1.15 2018-11-29
[Ko van der Sloot]
* added (still experimental) code for a FoLiA Builder, Processor and
TextProcessor class.
Use with care. The API may change unannounced!
* a foliadiff script (using folialint) is installed now
* several refactorings, to make the code more clear.
* the 'ref' attribute was not serialized for TextContent
* several smaller small bug fixes
* the .so version is bumped to 9 because of a lot of API/ABI changes
1.14 never released
1.13 2018-05-16
[Ko van der Sloot]
* disabled WordRefrence test. It was incomplete, and hard to do
* use icu:: namespace
[Maarten van Gompel]
* added codemeta.json
* fix spelling errors in error messages
1.12 2018-02-19
* configuration cleanup. MacOSX is better supported now.
* folialint now supports --fixtext (handle with care!)
* library version bumped to 8.0, due to changes in the API
* regenerated FoLiA properties (to FoLiA version 1.5.1)
* several small bug fixes
1.11 2017-12-04
Bug fix release:
* handling of <comment> tags within <t> nodes
* better handling of <wref> tags. Forbid forward references
1.10.1 2017-11-06
Minor fix
* bumped the .so version to 7.0
1.10 2017-10-17
Major Release, implementing FoLiA spec 1.5
* added text checking for all 1.5 documents and up
* added offset and ref checking for Text in all 1.5 documents and up
* 'empty' text inside TextContent, PhonContent and Textmarkup is significamt
* better version checking
* text checking can be dis/enabled using FOLIA_TEXT_CHECK environment variable
* added submetadata mechanism
* implemented aliasses for annotation setnames
* added an xmlstring() serializer for Document
* bug fixes:
- in LineBreak serializing
- XmlComment is textless.
- miscelaneous small fixes
1.9 2017-08-30
Bug fix release
* accept ICU 50 too (was 52) to make CentOS happy
* XmlComment INSIDE <t> lead to crashes. fixed.
* code changes in code that is only executed for documents in folia 1.5 format
(that shouldn't exist in the wild)
1.8 2017-00-16
Implements FoLiA spec 1.4.3
* adding textclass attribute
API changed. Bumped library version to 6.2.0
[Ko van der Sloot]
* added experimental textchecking code. only working for FoLiA documents
according to spec 1.5. NOT RELEASED YET!
Work in Progress
* fix in generate_id. AUTO_GENERATE_ID property was ignored.
* numereous bug fixes
1.7 2017-04-04
API changed so bumped library version to 6.1.0
[Ko van der Sloot]
* textcontent() and phoncontent() return const pointers, and also
work for TexContent adn PhonContent elements now
* some reactoring, as suggested by CPPCHECK
* typos
* added dangerous functions to manipulate the class of a TextContent
* added reference countion on annotations.
This allows to remove unneeded declarations.
* small bug fixes:
- str() should never throw.
- avoid memory leak
[maarten van Gompel]
* fixes in folia_properties for FoliA spec 1.4.1
1.6 2017-01-05
* We now implement FoLiA spec 1.4
* ABI breakage. .so name bumped to 6.0.0
reason:
- new properties added
- implementation of generateId() is changed
* enhancements to folialint. Saving a document with --strip also
implies canonical output
* some bug fixes
1.5 2016-11-14
[Ko van der Sloot]
* Bumped the .so name. Should have been done in 1.4!
* addition: text() mebmer for document-
* minor bug fixes:
- isNCname test now conforms to XML definition
- improved am error messag in Document
- check empty attributes in Feature() construction
1.4 2016-10-18
[Ko van der Sloot]
* Now fully implements Folia spec 1.3.2
- multiple ForeignData nodes
- added more Feature nodes, like Polarity, Strenght
- Source, Target, Relation, Predicate, Sentiment Statement,
Observation Annotations and Layers.
- Comment node
- better version checking.(and a bit relaxed too)
* some bug fixes and code improvement.
- str() works more as expected
- fixup ref 'id' vs. 'xml:id'
- improved sanity check to better test errors in the specs.
* added language getter and setter.
1.3 2016-07-11
[Ko van der Sloot]
Maintenance release:
- long options --help aan --version added
- fix in LineBreak: text() generates a newline
1.2 2016-05-23
[Ko van der Sloot]
* now based on Folia spec 1.2
- ForeignData node added
- Foreign metadata in document
- relaxed aref and ref type, implementing full xlink syntax for
'simple' and 'locator' type.
- Linebreak nows supports 'linenr', 'pagenr' and 'newpage' attributes
- enhanced folialint:
- added a '--strip' option to strip 'instable'
information like dates and version numbers.
- added '--output' option to speciy an file
- added '--nooutput'option to suppres output
- document outputs annotations in same order as read in.
- we no longer output the set for AnnotationLayers
- alien atributes are totally ignored now.
- small bug fixes
1.0 2016-03-09
[Ko van der Sloot]
* totally reworked the implementation:
it is based on code generated from generic definitions that
assure that the Python and C++ versions are always inline
(Most of that work done by Maarten van Gompel)
* simplified and overhauld the class API.
* a lot of bug fixes for cornercases
0.13 2016-01-14
[Ko van der Sloot]
* repository moved to GitHub
* added Travis support
* much smaller memory footprint
* Document deletion was very slow due to a brainfarth
* no more initialization problems in MT cases
* use XML_PARSE_HUGE to be able to handle VERY LARGE documents
* text() functions in line with the Python version
* added new "tags" to keep in sync with the Python version
* a lot of small code updates and refactorings
0.12 2014-09-23
[Ko van der Sloot]
* release 0.12
* library version bumped to 3.0.0 because ABI and API have heavily changed
0.11 2014-09-16
* now implements nodes like: Ref, Note, External
* Correction may be added to a lot more nodes
* several bug fixes
* better MT safe
* major API and ABI changes: Added a true virtual base,
using virtual inheritance.
0.10
* some XML stuff is moved to ticcutils
* also use some other goodies form ticcutils
* now implements new nodes like Metric, Coreferences and Semroles
(following the FoLiA 0.9 specs)
* a lot of code improvement too, including some bug fixes
0.9
* lost in tranistion
0.8 2012-03-29
* reworked and improved handling of (default) annotation sets.
We are more strict now.
* some refactoring to get a more uniform handling of folia::classes
0.7 2012-02-13
* some bugs fixed in annotation declaration handling
* added GAP annotation to gap
0.6 2012-01-09
* fixed a problem with escaping in arguments, "att='\\'" failed
0.5 2011-12-21
* 0.4 is lost in space
* rather extensive rework. API and ABI changed.
More to do, but releasing now is desirable
0.3 2011-11-02
[Ko van der Sloot]
* Get ready for first (beta) release as a package