Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_ascii() is wrong on musl-based systems #81

Closed
bastistician opened this issue Jun 18, 2023 · 2 comments
Closed

is_ascii() is wrong on musl-based systems #81

bastistician opened this issue Jun 18, 2023 · 2 comments

Comments

@bastistician
Copy link

On my Alpine Linux check system, I see

* checking tests ...
  Running ‘test-ci.R’ [0s/0s]
  Running ‘test-cran.R’ [1s/1s]
 [2s/2s] ERROR
Running the tests in 'tests/test-cran.R' failed.
Complete output:
  > testit::test_pkg('xfun', 'test-cran')
  
  Attaching package: 'xfun'
  
  The following objects are masked from 'package:base':
  
      attr, isFALSE
  
  assertion failed: is_ascii() can identify ascii strings
  Error from assert("is_ascii() can identify ascii strings", { ...  at test-encoding.R#18 
  Error: (!is_ascii(latin_str)) is not TRUE
  Execution halted

A minimal reproducible example for the failing test follows:

latin_str = 'fa\u00e7ile'
iconv(latin_str, to = "ascii")
#> [1] "fa*ile"

library(xfun)
(!is_ascii(latin_str))
#> [1] FALSE

So on musl the iconv call does not give NA as expected by is_ascii.
This behaviour is mentioned in help(iconv):

Some Linux distributions use 'musl' as their C runtime. This is
less comprehensive than 'glibc': it does not support '//TRANSLIT'
but does inexact conversions (currently using '*').
[...]
Note that you cannot rely on invalid inputs being detected,
especially for 'to = "ASCII"' where some implementations allow
8-bit characters and pass them through unchanged or with
transliteration or substitution.

This means that testing for an NA result of this conversion is generally insufficient.
is_ascii should probably also test whether the converted string differs from the input.

@yihui yihui closed this as completed in 3db41df Jun 21, 2023
@yihui
Copy link
Owner

yihui commented Jun 21, 2023

is_ascii should probably also test whether the converted string differs from the input.

Good idea. Done. Thank you!

@bastistician
Copy link
Author

Thanks, I can confirm that the package now passes the tests on my "--extra-arch" check system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants