Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang due to recursive regex in TextMate highlighting #97238

Closed
sheaf opened this issue May 8, 2020 · 8 comments
Closed

Hang due to recursive regex in TextMate highlighting #97238

sheaf opened this issue May 8, 2020 · 8 comments
Assignees
Labels
info-needed Issue requires more information from poster

Comments

@sheaf
Copy link

sheaf commented May 8, 2020

VS Code: versions 1.44 and 1.45
OS: Windows 10 x64


In working on the Haskell syntax highlighting extension I unfortunately had to resort to some rather complex regular expressions (I wish there was a better system for syntax highlighting but oh well).

One in particular seems to cause VS Code to hang.

I've managed to minimise the problem to the following. Enable TextMate highlighting with the following single pattern:

  "patterns": [
    {
      "begin": "=",
      "end": "(?<paren>\\((?:[^\\(\\)]*|\\g<paren>)*\\))\\s*`[\\p{Lu}\\p{Lt}][\\p{Ll}_\\p{Lu}\\p{Lt}\\p{Nd}]*`"
    }
  ],

With syntax highlighting enabled and using the above TextMate rule, paste the following into a VS Code window:

= ( T b c ( d e ) f ) `C` ( A b )

This should not cause any particular issues yet (although the syntax highlighting might be a bit slow to appear, not that you can tell as no TextMate scopes are present in this minimal repro).
Now try to remove the closing parenthesis after f. This should cause VS Code to hang for an unreasonable amount of time (and it is significantly worse with the non-simplified regular expression).

Note that using the same regular expression

(?<paren>\((?:[^\(\)]*|\g<paren>)*\))\s*`[\p{Lu}\p{Lt}][\p{Ll}_\p{Lu}\p{Lt}\p{Nd}]*`

in an online editor such as Regex101 causes no issues, and the highlighting updates very responsively.

This is with VS Code 1.44 (on Windows 10 x64).
The situation seems to have improved slightly with the update 1.45 as the hang is noticeably smaller, but I still feel like the delay is unreasonable for a simple recursive expression (used to parse nested parentheses), given that other highlighters seem to be much faster.

@effleurager
Copy link
Contributor

effleurager commented May 20, 2020

A regex search of a file with the following also causes a hang:

smallest_valid_png = %w[89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 00 01 00 00 00 01 08 06 00 00 00 1f 15 c4 89 00 00 00 0a 49 44 41 54 78 9c 63 00 01 00 00 05 00 01 0d 0a 2d b4 00 00 00 00 49 45 4e 44 ae 42 60 82]

regex:

(?:smallest_valid_png = %w\[)(?:.*( ))*\]

Version: 1.45.1
Commit: 5763d90
Date: 2020-05-14T08:33:47.663Z
Electron: 7.2.4
Chrome: 78.0.3904.130
Node.js: 12.8.1
V8: 7.8.279.23-electron.0
OS: Darwin x64 19.4.0

@alexr00
Copy link
Member

alexr00 commented Jul 7, 2020

@sheaf Can you try again with 1.46 or 1.47 insiders? There have been some textmate related changes.

@alexr00 alexr00 added the info-needed Issue requires more information from poster label Jul 7, 2020
@effleurager
Copy link
Contributor

No change in regex thrashing behaviour.


Version: 1.47.0-insider
Commit: e95f9e6
Date: 2020-07-07T07:46:45.605Z
Electron: 7.3.2
Chrome: 78.0.3904.130
Node.js: 12.8.1
V8: 7.8.279.23-electron.0
OS: Darwin x64 19.5.0

@sheaf
Copy link
Author

sheaf commented Jul 7, 2020

Hey @alexr00, thanks for the follow-up. Unfortunately I haven't noticed any improvements on 1.46.1 compared to 1.45 (whereas 1.45 had been a significant improvement on 1.44).

@alexr00
Copy link
Member

alexr00 commented Sep 29, 2020

@alexdima is there anything that vscode textmate can do about this?

@alexdima
Copy link
Member

alexdima commented Oct 1, 2020

vscode-textmate uses oniguruma as the regex library, so it might have some missed optimization opportunity compared to regex101 for the regex:

(?<paren>\((?:[^\(\)]*|\g<paren>)*\))\s*`[\p{Lu}\p{Lt}][\p{Ll}_\p{Lu}\p{Lt}\p{Nd}]*`

Regarding #97238 (comment) , that looks to be mentioning a search in file, where we use the built-in JavaScript regex, so these two issues are AFAICT unrelated.

@alexdima alexdima self-assigned this Oct 1, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Dec 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
info-needed Issue requires more information from poster
Projects
None yet
Development

No branches or pull requests

6 participants
@sheaf @alexdima @aeschli @effleurager @alexr00 and others