Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(web): import the generator for the pred-text wordbreaker's Unicode-property data-table ⚡ #10690

Merged
merged 27 commits into from
Aug 27, 2024

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Feb 13, 2024

Fixes #7224.

Note: approximately 2800 lines are from externally-defined Unicode property data, imported for use in-repo.

This PR imports https://github.com/eddieantonio/unicode-default-word-boundary/tree/master/libexec (MIT licensed!) for direct inclusion and use within our repository. After some reorganization and tweaks, I've got it spitting out a perfect data-table match for our current data.ts referenced by the wordbreaker; git diff shows no differences between the table in our existing data.ts and the table built by the newly-included generator as of commit 58c7575.

To avoid accidental shifts in behavior at unexpected times, I've opted to require manual updates of the underlying data tables. This update may be run through /resources/standards-data/unicode-character-database/build.sh configure after updating the Unicode version specified in resources/build/minimum-versions.inc.sh. Refer to #12103 for related setup. This build-script is new with this PR, allowing a more streamlined update process whenever we feel the need to trigger it.

Also note: the wordbreaker was operating with Unicode 13.0.0 data; this will update its data to 15.1.0, which is used throughout our other platforms.

@keymanapp-test-bot skip

@jahorton jahorton added this to the 18.0 milestone Feb 13, 2024
@keymanapp-test-bot
Copy link

keymanapp-test-bot bot commented Feb 13, 2024

@github-actions github-actions bot added web/ and removed web/ labels Feb 16, 2024
@jahorton jahorton changed the base branch from master to fix/web/lm-worker-test August 6, 2024 07:47
@github-actions github-actions bot added the web/ label Aug 21, 2024
@jahorton jahorton merged commit 3e89ff8 into master Aug 27, 2024
27 of 28 checks passed
@jahorton jahorton deleted the feat/web/wordbreaker-property-data-gen branch August 27, 2024 03:19
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 18.0.99-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(common/models): update wordbreaker data
4 participants