Separate text into graphemes, codepoints, UTF-16 code units, and UTF-8 code units. Delete individual codepoints to see how graphemes boil and shift, bending to your will.
To store codepoint names, I wrote my own binary format. https://github.com/LiterallyVoid/unicode-character-names-binary
This project (index.html
and the src
directory) is available under the MIT license (see LICENSE
)
data/ucd.bin
is derived from Unicode's ucd.all.flat.xml
, retrieved from https://www.unicode.org/Public/UCD/latest/ucdxml/ on 2024-05-03, and as such is under the Unicode License V3 (see data/LICENSE
)