Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmbundle: Make tmLanguage compatible with PCRE2 #19

Open
lildude opened this issue Sep 2, 2022 · 3 comments
Open

tmbundle: Make tmLanguage compatible with PCRE2 #19

lildude opened this issue Sep 2, 2022 · 3 comments
Assignees

Comments

@lildude
Copy link

lildude commented Sep 2, 2022

👋 I'm the lead maintainer of the https://github.com/github/linguist library which is used for language detection and providing the syntax highlighting for languages on GitHub.com, and we use this grammar.

Our grammar compiler has found several problems with your grammar which I thought I'd let you know about.

These regexes have quite a few problems as you can see in the regex101 link after each:

"end": "(?=[,;\\](]|/>|(?<=[^=])>|(?<!(?:^|[!~*%&^|?:]|[!~*%&^|?/<>+=-]=|=>|>{2,}|[^.]\\.|[^-]-|^\\s*\\+\\+|[^\\+]\\+{2}*\\+|[a-zA-Z0-9%).<\\]}]\\s*/|\\b(?<![.]\\s*)(?:await|async|class|function|keyof|new|typeof|void))\\s*)(?:\\n|[ \\t]+(?![\\n{+!~*%&^|?:]|[<>/=-]=|=>|>{2,}|\\.[^.]|-[^-]|/[^>]|(?:in|instanceof|as|extends)\\s+[^:=/,;>])))",

https://regex101.com/r/pSG73T/1

"end": "(?=[,;\\]]|/>|(?<=[^=])>|(?<!(?:^|[!~*%&^|?:]|[!~*%&^|?/<>+=-]=|=>|>{2,}|[^.]\\.|[^-]-|^\\s*\\+\\+|[^\\+]\\+{2}*\\+|[a-zA-Z0-9%).<\\]}]\\s*/|\\b(?<![.]\\s*)(?:await|async|class|function|keyof|new|typeof|void))\\s*)(?:\\n|[ \\t]+(?![\\n{(+!~*%&^|?:]|[<>/=-]=|=>|>{2,}|\\.[^.]|-[^-]|/[^>]|(?:in|instanceof|as|extends)\\s+[^:=/,;>])))",

... and repeated again at:

"end": "(?=[,;\\]]|/>|(?<=[^=])>|(?<!(?:^|[!~*%&^|?:]|[!~*%&^|?/<>+=-]=|=>|>{2,}|[^.]\\.|[^-]-|^\\s*\\+\\+|[^\\+]\\+{2}*\\+|[a-zA-Z0-9%).<\\]}]\\s*/|\\b(?<![.]\\s*)(?:await|async|class|function|keyof|new|typeof|void))\\s*)(?:\\n|[ \\t]+(?![\\n{(+!~*%&^|?:]|[<>/=-]=|=>|>{2,}|\\.[^.]|-[^-]|/[^>]|(?:in|instanceof|as|extends)\\s+[^:=/,;>])))",

https://regex101.com/r/NlVs41/1

These are the errors our compiler reported:

  • Invalid regex in grammar: text.marko (in syntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\](]|/>|(?<=[^=])>|(?<!(?:...": nothing to repeat (at offset 105))
  • Invalid regex in grammar: text.marko (in syntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\]]|/>|(?<=[^=])>|(?<!(?:^...": nothing to repeat (at offset 104))
  • Invalid regex in grammar: text.marko (in syntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\]]|/>|(?<=[^=])>|(?<!(?:^...": nothing to repeat (at offset 104))
@DylanPiercey
Copy link
Contributor

Will look shortly. Thanks for the report!

@DylanPiercey
Copy link
Contributor

@lildude from what I can tell the regex functions correctly. The validator you are using is using PCRE2 however my understanding was that the regex's in tmgrammars are intended to be handled by oniguruma. Is this not the case for linguist?

@lildude
Copy link
Author

lildude commented Sep 3, 2022

Is this not the case for linguist?

No. GitHub uses PCRE for grammar parsing for performance reasons.

@mlrawlings mlrawlings changed the title Several invalid regexes in grammar Make Nov 10, 2022
@mlrawlings mlrawlings changed the title Make Make tmLanguage compatible with PCRE2 Nov 10, 2022
@mlrawlings mlrawlings changed the title Make tmLanguage compatible with PCRE2 tmbundle: Make tmLanguage compatible with PCRE2 Feb 3, 2023
@DylanPiercey DylanPiercey moved this from Todo to Done in The Everything Project Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants