Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntityParser can't handle encoded emoji #67

Open
aduth opened this issue Nov 15, 2018 · 2 comments · May be fixed by #70
Open

EntityParser can't handle encoded emoji #67

aduth opened this issue Nov 15, 2018 · 2 comments · May be fixed by #70
Assignees

Comments

@aduth
Copy link
Contributor

aduth commented Nov 15, 2018

While the tokenizer will gracefully decode most encoded characters:

⇒ node
> var Tokenizer = require( 'simple-html-tokenizer' );
undefined
> Tokenizer.tokenize( '&' )[ 0 ].chars === '&'
true

It doesn't handle characters whose encodings exceed 16 bits (e.g. emoji):

⇒ node
> var Tokenizer = require( 'simple-html-tokenizer' );
undefined
> Tokenizer.tokenize( '😅' )[ 0 ].chars === '😅'
false

It may be that EntityParser should use String.fromCodePoint in place of String.fromCharCode instead, or an equivalent polyfill?

Related:

@rwjblue
Copy link
Collaborator

rwjblue commented Nov 20, 2018

Seems reasonable to me...

@krisselden what do you think?

@krisselden
Copy link
Collaborator

Yes

@CvX CvX linked a pull request Jul 1, 2019 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants