Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does stringi export something like u_hasBinaryProperty(c, UCHAR_ALPHABETIC)? #515

Open
dmurdoch opened this issue Feb 1, 2025 · 3 comments

Comments

@dmurdoch
Copy link

dmurdoch commented Feb 1, 2025

I am writing a parser for LaTeX code, and I'm hoping to support UTF-8 input. TeX and LaTeX categorize each input character, and one of the categories is whether it is a letter or not. I'm not sure how the Unicode-supporting versions of LaTeX handle this, but one thing I wanted to try was to use the ICU test u_hasBinaryProperty(c, UCHAR_ALPHABETIC). That's the only ICU function I need, so linking ICU into my package is possible but seems like overkill.

Does stringi provide this kind of categorization of the characters in a string? Ideally it would be something I could call from C, but if it's only available from R that would be very helpful too. I couldn't spot it in the reference docs, but maybe I just missed it.

@gagolews
Copy link
Owner

gagolews commented Feb 3, 2025

As per Sec. 5.4.3 of Writing R Extensions, I've made this function available via R_GetCCallable (in the current development version of stringi). It's declared as

int stric_u_hasBinaryProperty(int c, int which);

See https://github.com/gagolews/stringi/blob/master/src/stri_callables.cpp

Let me know if that works for you?

@gagolews
Copy link
Owner

gagolews commented Feb 3, 2025

UCHAR_ALPHABETIC is 0 (https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/uchar.h) /this is very unlikely to change in the future/

@dmurdoch
Copy link
Author

dmurdoch commented Feb 3, 2025

Thanks! I'll give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants