Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): support stacked markers, prep for marker segments 🙀 #10326

Merged
merged 6 commits into from
Jan 11, 2024

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Jan 6, 2024

  • renamed the single-segment functions to normalize_nfd_markers_segment() since they should only be run on a single segment
  • commented out NFC for now, dead code
  • add some additional checks/DebugLog around normalization
  • add stacked markers tests. Some old tests had to change also.

#10320

@keymanapp-test-bot skip

srl295 added 3 commits January 5, 2024 18:52
- we can finally support single markers with the new structure
- renamed the single-segment functions to normalize_nfd_markers_segment() since they should only be run on a single segment
- commented out NFC for now, dead code
- add some additional checks/DebugLog around normalization

#10320
@srl295 srl295 self-assigned this Jan 6, 2024
@srl295 srl295 added this to the A17S30 milestone Jan 6, 2024
@keymanapp-test-bot
Copy link

keymanapp-test-bot bot commented Jan 6, 2024

@github-actions github-actions bot added core/ Keyman Core feat labels Jan 6, 2024
{
marker_map map;
std::cout << __FILE__ << ":" << __LINE__ << " - dup-char test" << std::endl;
const std::u32string src = U"a\uFFFF\u0008\u0001\u0300e\uFFFF\u0008\u0002\u0300";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it says, normalize_nfd_markers() would get this wrong, because U+0300 appears twice. But at least the segment code handles it properly.

Comment on lines -78 to +79
<transform from="8e\u{0300}\u{0320}\m{stampy}\m{lgtm}" to="FAIL" /> <!-- denormalized, nomatch -->
<transform from="8e\u{0320}\u{0300}\m{stampy}\m{lgtm}" to="YES8d" /> <!-- NFD -->
<transform from="8e\u{0300}\u{0320}\m{stampy}\m{lgtm}" to="YES8c" /> <!-- denormalized, but matches -->
<transform from="8f\u{0320}\u{0300}\m{stampy}\m{lgtm}" to="YES8d" /> <!-- NFD -->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one with FAIL didn't use to match because of two markers. But now, it can match either way! 💯

@srl295
Copy link
Member Author

srl295 commented Jan 8, 2024

mac failure is just the agreement issue

@mcdurdin
Copy link
Member

mcdurdin commented Jan 9, 2024

mac failure is just the agreement issue

agreement issue only affects iOS. mac build failures appear to be flaky Apple timestamp service. I thought I'd added a workaround to retry for that though (#10243) which doesn't appear to be working.

@mcdurdin
Copy link
Member

mcdurdin commented Jan 9, 2024

@jahorton can you figure out why the Test: Language Modeling Layer (Common) build failed?
https://build.palaso.org/viewLog.html?buildId=433516&buildTypeId=Keyman_Common_LMLayer_TestPullRequests

@mcdurdin
Copy link
Member

mcdurdin commented Jan 9, 2024

agreement issue only affects iOS. mac build failures appear to be flaky Apple timestamp service. I thought I'd added a workaround to retry for that though (#10243) which doesn't appear to be working.

My bad. I conflated two different PR builds. This one was indeed the agreement issue. The timestamp failure was in #10327 (https://build.palaso.org/buildConfiguration/Keyman_KeymanMac_PullRequests/433525?expandBuildDeploymentsSection=false&hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildProblemsSection=true&expandBuildChangesSection=true). Apologies.

@jahorton
Copy link
Contributor

jahorton commented Jan 9, 2024

@jahorton can you figure out why the Test: Language Modeling Layer (Common) build failed? https://build.palaso.org/viewLog.html?buildId=433516&buildTypeId=Keyman_Common_LMLayer_TestPullRequests

01:52:46
npm ERR! npm ERR! errno ECONNRESET

01:52:46
npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2flogger: aborted

01:52:46
npm ERR! npm ERR! network This is a problem related to network connectivity.

01:52:46
npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.

This has been happening from time to time for the past few months. Not necessarily on that specific package - just the loss of connectivity when trying to retrieve a package.

@srl295 srl295 requested a review from jahorton January 9, 2024 01:46
@mcdurdin
Copy link
Member

mcdurdin commented Jan 10, 2024

This has been happening from time to time for the past few months. Not necessarily on that specific package - just the loss of connectivity when trying to retrieve a package.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +319 to +325
{
const auto normalize_ok = ldml::normalize_nfd_markers(ctxtstr);
assert(normalize_ok);
if(!normalize_ok) {
DebugLog("ldml_processor::process_output: failed ldml::normalize_nfd_markers(ctxtstr)");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to abort processing in release builds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it fails the string is just unchanged and so the match may not work properly or wrong output. But there's no inherent danger in proceeding.

Is it worth trying to pop up an error code to the engine? I thought the answer before was, no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No danger is what I was looking for. There's no way we can pop an error, so 'bad output' is the best error we can do. (And no, don't even think about emitting an error as key events "YOUR KEYBOARD HAS GONE WRONG" emitted every time you press a key?!)

Comment on lines +331 to +338
{
const auto normalize_ok = ldml::normalize_nfd_markers(ctxtstr_cleanedup);
assert(normalize_ok);
if(!normalize_ok) {
DebugLog("ldml_processor::process_output: failed ldml::normalize_nfd_markers(ctxtstr_cleanedup)");
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, do we need to abort processing?

core/src/ldml/ldml_transforms.cpp Show resolved Hide resolved
core/src/ldml/ldml_transforms.hpp Show resolved Hide resolved
core/src/ldml/ldml_transforms.hpp Show resolved Hide resolved
- cleanup NFC code later if it's entirely unneeded.

Co-authored-by: Marc Durdin <[email protected]>
@srl295 srl295 merged commit f68e9f3 into master Jan 11, 2024
17 checks passed
@srl295 srl295 deleted the feat/core/10320-stack-markers-epic-ldml branch January 11, 2024 15:40
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 17.0.242-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(core): Normalization needs to support multiple markers 🙀
4 participants