Does stringi export something like `u_hasBinaryProperty(c, UCHAR_ALPHABETIC)`? #515

dmurdoch · 2025-02-01T17:48:32Z

I am writing a parser for LaTeX code, and I'm hoping to support UTF-8 input. TeX and LaTeX categorize each input character, and one of the categories is whether it is a letter or not. I'm not sure how the Unicode-supporting versions of LaTeX handle this, but one thing I wanted to try was to use the ICU test u_hasBinaryProperty(c, UCHAR_ALPHABETIC). That's the only ICU function I need, so linking ICU into my package is possible but seems like overkill.

Does stringi provide this kind of categorization of the characters in a string? Ideally it would be something I could call from C, but if it's only available from R that would be very helpful too. I couldn't spot it in the reference docs, but maybe I just missed it.

The text was updated successfully, but these errors were encountered:

gagolews · 2025-02-03T13:42:39Z

As per Sec. 5.4.3 of Writing R Extensions, I've made this function available via R_GetCCallable (in the current development version of stringi). It's declared as

int stric_u_hasBinaryProperty(int c, int which);

See https://github.com/gagolews/stringi/blob/master/src/stri_callables.cpp

Let me know if that works for you?

gagolews · 2025-02-03T13:44:08Z

UCHAR_ALPHABETIC is 0 (https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/uchar.h) /this is very unlikely to change in the future/

dmurdoch · 2025-02-03T13:56:08Z

Thanks! I'll give it a try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does stringi export something like `u_hasBinaryProperty(c, UCHAR_ALPHABETIC)`? #515

Does stringi export something like `u_hasBinaryProperty(c, UCHAR_ALPHABETIC)`? #515

dmurdoch commented Feb 1, 2025

gagolews commented Feb 3, 2025

gagolews commented Feb 3, 2025

dmurdoch commented Feb 3, 2025

Does stringi export something like u_hasBinaryProperty(c, UCHAR_ALPHABETIC)? #515

Does stringi export something like u_hasBinaryProperty(c, UCHAR_ALPHABETIC)? #515

Comments

dmurdoch commented Feb 1, 2025

gagolews commented Feb 3, 2025

gagolews commented Feb 3, 2025

dmurdoch commented Feb 3, 2025

Does stringi export something like `u_hasBinaryProperty(c, UCHAR_ALPHABETIC)`? #515

Does stringi export something like `u_hasBinaryProperty(c, UCHAR_ALPHABETIC)`? #515