-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special character treatment #36
Comments
already included: |
review this issue also regarding encoding stuff, after this topic is included into the new advanced r version |
basics about encoding: -ascii: nr 32-127 space = 32, A = 65, etc. -> 7 bits (2^7 = 128) (most were using 8 bit bytes) -1 byte = 8 bits = possibilities: 0-255 -> 128-255 still free
ANSI standard -> 0-127 have the same meaning for all encodings regarding this standard
-> Unicode!!!
Endocings (to store unicode)
|
see also here: Maybe it is best to set the encoding at the beginning via Encoding(x) <- "UTF-8" which will result in "UTF-8" (for words with special characters) and "unknown" for words without special characters (ASCII 0-127). |
or maybe better |
added |
propaply |
sth like mentioned here https://unix.stackexchange.com/questions/171832/converting-a-utf-8-file-to-ascii-best-effort |
need to check out this |
|
or maybe just wrap the stri_trans_general function and supply the id as an argument to to any case stri_trans_general("gro\u00df", "latin-ascii") |
have to look for a language and symbol agnostic transcription for unicode...icu doesnt seem to work for everything (Latin-ASCII has some problems and ü to ue is not possible in any icu conversion). maximally it could be added as an extrafeature... |
Use |
implemented now, update, when I find a nice source for additional transliteration dictionaries. |
Systematical Addition of cases to support, similar to german umlauts. Also decide of and how to replace in replace Special characters argument
The text was updated successfully, but these errors were encountered: