Use a proper parser for link detection #583

Tyriar · 2017-03-03T20:30:48Z

There will always be issues with URL matching using regex as some cases simply can't be caught with regex. An example where regex will fail is including brackets in URLs but only when the brackets are opened within the url:

Include brackets: http://<domain>.com/foo(bar)
Don't include wrapping brackets: (http://<domain>.com/foobar)
Include commas: http://<domain>.com/foo,bar
Don't include trailing commas: http://<domain>.com/foo, other text
Detect spaces (paths are ambiguous?) c:\Users\X\My Documents

The text was updated successfully, but these errors were encountered:

egmontkob · 2017-11-28T11:55:25Z

FYI: These particular cases are handled correctly by gnome-terminal using regexes (pretty complex ones though).

See big rewrite bug, balanced parentheses bug, current implementation, unittests.

Tyriar · 2018-03-22T17:29:01Z

When someone tackles this issue there are some more cases on the VS Code issue tracking this microsoft/vscode#21125

Tyriar · 2018-06-02T18:59:33Z

Merged some issues into this one to support chrome-devtools scheme #599, no TLD #653 and long TLDs #601.

Tyriar · 2018-06-21T04:57:25Z

Merging UNC path support into this, eg. \\localhost\c$\GitDevelopment\standup\server\db.js

JaneX8 · 2019-09-30T14:10:11Z

Will this be fixed anytime soon? It's very common for Windows users to have a username Firstname and Lastname. Meaning it will include spaces. Breaking Xterm.js which in term breaks the terminal that VSCode implements. A project folder in the user folder C:\Users\John Doe\Documents\Github\com.example.app would break the "clickable" path's in the terminal.

See: microsoft/vscode#21125

Tyriar · 2019-09-30T18:45:38Z

No ETA but it's open to PRs. I imagine the eventually implementation will look similar to this: https://github.com/microsoft/vscode/blob/master/src/vs/editor/common/modes/linkComputer.ts

Tyriar · 2019-09-30T18:46:07Z

Also needs a little thinking around how this affects the existing experimental "link matcher" api.

Tyriar · 2019-10-23T23:31:45Z

Current state

The way links work right now is that after the viewport has stopped scrolling for 200ms, the links are computed for the current viewport. Embedders can provide a validation callback which allows the embedder to validate the link sometime after the links are computed, note that links are only shown once they are validated. The end result is pretty nice, we can present underlines for all links even when you need a modifier to execute the link, it does however have some fundamental problems:

Even though we're just processing when the viewport stops, that's still a lot of work that would go unused.
Since the entire viewport is processed, it's impractical/slow to create links for many words, for example files without extensions (Linkify terminal paths without any path separators microsoft/vscode#22772).
Doing this much validation on any text that looks like kind of like a file is way too expensive on some environments, as any sequence of characters could be a file. This is particular bad on remote file systems (Severe performance degradation opening large folders in latest release microsoft/vscode#79336).
The current system is complex; I think there are still race conditions in the linkifier and links disappear if you hide and show the terminal (Terminal link detection only works after scrolling microsoft/vscode#36072, closed but it's still an issue).
It's all regex-based and difficult to maintain and extend (at least for myself).

class Terminal {
	registerLinkMatcher(regex: RegExp, handler: (event: MouseEvent, uri: string) => void, options?: ILinkMatcherOptions): number;
	deregisterLinkMatcher(matcherId: number): void;
}
interface ILinkMatcherOptions {
	matchIndex?: number;
	validationCallback?: (uri: string, callback: (isValid: boolean) => void) => void;
	tooltipCallback?: (event: MouseEvent, uri: string, location: IViewportRange) => boolean | void;
	leaveCallback?: () => void;
	priority?: number;
	willLinkActivate?: (event: MouseEvent, uri: string) => boolean;
}

Proposal

The long standing hope was to move to a "parser-based" link system (#583) but it was never really clear how an addon would provide a parser exactly. Here's my very VS Code API-inspired proposal:

class Terminal {
	registerLinkProvider(linkProvider: ILinkProvider): IDisposable;
}

interface ILinkProvider {
	provideLink(position: IBufferCellPosition, callback: (link: ILink | undefined) => void): void;
}

interface ILink {
	range: IBufferRange;
	showTooltip(event: MouseEvent, link: string): void;
	hideTooltip(event: MouseEvent, link: string): void;
	handle(event: MouseEvent, link: string): void;
}

interface IBufferRange {
	start: IBufferCellPosition;
	end: IBufferCellPosition;
}

interface IBufferCellPosition {
	x: number;
	y: number;
}

The basic idea is that instead of evaluating links whenever scrolling has stopped, only evaluate links for the current cursor position when a hover occurs. While file access is slow en masse, single requests just for mouse movement should be reasonable. Some other things of interest:

The addon can still use the regex method and translate the results an IBufferRange or use a parser to check the whole line, or just expand outwards from the cursor.
It uses the buffer API to access the line, that's pretty cool 😎.
The link could be cached until a scroll happens for that entire range, meaning no need to recompute just for moving the mouse over the same cells.
It allows embedders to use their own link detection mechanism, VS Code has a shared implementation but we've been unable to use it in the terminal (Share link detection logic with the terminal and editor microsoft/vscode#83191).

Going this route would seemingly fix many of the problems VS Code has with links, namely:

No more perf issues validating on slow file systems.
We can linkify paths without separators since validation will only be triggered under the cursor.
By expanding left then right from the cursor we can fix many of the issues with the current link detection (Terminal link detection issues microsoft/vscode#21125), including spaces in Windows paths (this is still a tricky problem but seems more achievable).

The only downside for this is a slight delay in the link appearing as the computing and validation occurs at hover time.

Open questions

Do we need an ILinkMatcherOptions.priority equivalent?
This would be an ideal time to introduce Promise into the API, working with promises is lovely but we will always have callbacks in the parser for performance reasons. We could just stick to the callback everywhere for consistency?
We could use markers for the range instead of numbers, that would allow the link to be cached after a scroll occurs. It probably isn't worth caching like this imo but it does bring up another interesting question in the Replace ISelectionPosition API with IBufferRange #2480 discussion.

jmbockhorst · 2019-10-24T14:04:47Z

Would this be an addon? Or part of the existing Linkifier?

jerch · 2019-10-24T14:56:05Z

@Tyriar Good idea to move the expensive link matching to the hover state, will def. remove some "lagging" from the viewport pause state. Few ideas/remarks from my side:

Imho its important to note that with a change like this its not possible anymore to prestyle links (underline or some special link color right from the start). Imho not a big issue, a terminal is not meant to work like a web browser in the first place.
priority - Not sure what you want it to help with? Nested links?
I love promises for their syntactic sweetness, still not sure if we should introduce them here while everything else is callback based (or sync API). Maybe stick to the least common and more performant denominator here - callbacks. Any JS programmer should be able to transform those into lovely promises. Beside the worse runtime promises also screw up stack traces, thus ppl might have reasons to avoid them.
Caching is indeed a problem here - w'o using markers its only possible for a "stable viewport" (any line buffer shift would invalidate the cache). Markers alone can not avoid reeval here - a cell might have changed thus a parser should not mark it as a link anymore. Markers would still be helpful in conjunction with a viewport update range event - this way a parser could limit the link eval to that range on hover and just return cache entries otherwise. Also note, that later added cell content that forms a correct link string with older content by accident should not be marked as a valid link, here the viewport statefulness makes it hard to correctly deal with it, which makes me think that we might need not only a line range update event, but a more fine-grained position-range update event.
I'd love to see this more as a general API approach to mark things in the viewport/buffer based on some condition (which could even be used by the search addon, partly answers @jmbockhorst question). Not sure if thats feasible, might be hard to harmonize between the different needs.

Tyriar · 2019-10-24T15:55:42Z

@jmbockhorst it would be a new Linkifier class (Linkifier2?) that would deprecate the existing registerLinkMatcher API and change the implementation of web-links. For VS Code we would not use web-links any more and instead leave all link detection up to the existing mechanisms.

I'd love to see this more as a general API approach to mark things in the viewport/buffer based on some condition (which could even be used by the search addon

@jerch that's basically where we're headed with decorations #1852, #1653

priority - Not sure what you want it to help with? Nested links?

Can't remember exactly why it was added, maybe to have web links take precedence over local or vice versa?

I love promises for their syntactic sweetness, still not sure if we should introduce them here while everything else is callback based (or sync API). Maybe stick to the least common and more performant denominator here - callbacks.

Yeah guess we should just commit to this, we can use Promises internally but callbacks in the API for consistency.

Imho its important to note that with a change like this its not possible anymore to prestyle links

Pre-styling isn't desirable imo as underline is a perfectly valid state.

Tyriar · 2019-10-25T10:31:47Z

Not sure if I'll have time to get to this in the next couple of months so this is open to PRs if someone want to have a go at implementing a proof of concept. I think we'll want to support both links types until the next major version at which point registerLinkMatcher will get removed.

mofux · 2019-11-01T10:29:55Z

IMO this would be an ideal use-case to refine our Marker API, so detected links can be tracked by a sticky range marker and can get auto-disposed (invalidated) in the following events:

A marked buffer range get's modified (e.g. you have a link in your prompt line and add / delete some characters)
A marked buffer range gets cleared out (e.g. clear command)
A marked buffer range falls out of the scrollback

Note that this marker should track the whole buffer, not just the viewport.

Once we have that marker that marks a buffer range (sticky), we can add another layer of abstraction that annotates that marked range with additional meaning. For example a link annotation, that references the marker and holds the parsed link.

As a last step, a renderer can iterate annotations in the viewport and draw them accordingly.

Here is a rough implementation of the linkifier that would use such an API:

// listen for buffer changes, we will get an IBufferRange in that indicates
// which part of the buffer was changed
terminal.buffer.onChange((range, buffer) => {
  
  // do the magic, detect links in that range
  const links = detectLinks(range, buffer);

  // create a marker for those links
  for (let link of links) {
    const linkMarker = buffer.addMarker(link.range);
    const linkAnnotation = buffer.addAnnotation({ type: 'link', data: link.url, marker: marker  });
    // dispose marker and annotation if you don't want them anymore
    // linkMarker.dispose();
    // linkAnnotation.dispose();
  }

});

I think this approach is ideal for performance. The link detection can be throttled, and only modified ranges in the buffer have to be rescanned.

It also cleanly decouples that semantic annotation of buffer ranges from their representation by the renderer.

jerch · 2019-11-01T12:05:06Z

@mofux Yeah very good idea. I want to extend this:

A marked buffer range get's modified (e.g. you have a link in your prompt line and add / delete some characters)

to all cells in active viewport* - if a marker partly covers cells of that, its content can still be modified by cursor jumps + writes, thus the marker should break. A bonus would be to re-eval those positions (with extends to left and right), as the new content might form a valid link again.

[*] Viewport is not the correct name for that - I mean those bufferlines, that are still reachable by the cursor. Guess we dont even have a name for that. Maybe "editable buffer"?

Edit: Note that a buffer.onChange will be hard to implement for cell level (both code and perf-wise) - we would have to track any cell change during chunk processing and summing them up into such an event. We kinda have that for lines already (dirtyRowService), but not on individual cell level. Maybe we can stick with line level here for the sake of performance? (Ofc this would raise the burden on any consumer of such an event).

mofux · 2019-11-01T12:42:52Z

@jerch Yeah I guess line level changes would be sufficient. We could still fire an IBufferRange for the affected lines though - so we can potentially support cell level changes more easily in the future without breaking API.

to all cells in active viewport* - if a marker partly covers cells of that, its content can still be modified by > cursor jumps + writes, thus the marker should break. A bonus would be to re-eval those positions (with > extends to left and right), as the new content might form a valid link again.

I'd think that a marker only breaks if a write happens within a line that is inside the marked range. The linkify implementation can over or underscan additional lines for link detection.

Maybe "editable buffer"?

Maybe "mutable buffer", or "non-scrollback buffer"? 😅

jerch · 2019-11-01T13:22:17Z

I'd think that a marker only breaks if a write happens within a line that is inside the marked range.

Yes, if onChange returns a range of line2 - line20 a marker ending on line1 should not be bothered, same for a marker covering line21+. Also this automatically turns any marker living in the scrollbuffer into stable ones (beside the disposing when finally dropping off).

Maybe "mutable buffer", or "non-scrollback buffer"?

Mutable suggests that the scrollback buffer is static/immutable which is not the case for resize. "Cursor-Addressible-Buffer" === CAB? Lol just kidding, all are kinda word monsters...

Tyriar · 2019-11-05T15:24:07Z

I think markers are a little heavy for this when you consider how links will be typically be used; the user types a bunch of stuff then wants to click on a link, that link will likely never be clicked again when it leaves the viewport. Caching the viewport would be good but anything outside of that I think is overkill. I would also invalidate when all lines of the link's range leave the viewport (not the CAB/NSB/MB 😉).

jerch · 2019-11-06T10:53:14Z

I think markers are a little heavy ...

Yeah your prolly right, markers still have the update burden for ANY buffer scroll, thus we should use them with caution. (Its a big difference for term.write if there are only 10 markers vs. 1000+.)

Regarding link detection and decoupling from render state - I think its perfectly fine to announce onHover the current mouse position, a link parser would only need to parse one wrapped line to find links (a link cannot span several real lines). This seems rather cheap to me, if a parser really wants to get more involved it could cache the findings based on line content (to make sure to correctly spot content changes). The cache here could help to debounce parsing from every single onHover report, still I'd give the cache entries rather short lifecycles as re-parsing seems so cheap. (Have not yet looked at #2530 😊)

Tyriar · 2020-04-30T18:54:27Z

Let's call this done with the link provider work which is about to ship in VS Code

Tyriar added the type/enhancement Features or improvements to existing features label Mar 3, 2017

Tyriar self-assigned this Mar 3, 2017

Tyriar mentioned this issue Mar 3, 2017

Terminal link detection issues microsoft/vscode#21125

Closed

Tyriar mentioned this issue Mar 17, 2017

Replace Linkifier RegExp to RFC3986 based one #611

Closed

Tyriar mentioned this issue Mar 27, 2017

URL recognition in terminal microsoft/vscode#23269

Closed

Tyriar mentioned this issue May 9, 2017

Could not linkify hostnames without TLDs #653

Closed

Tyriar added the help wanted label Jun 22, 2017

This was referenced Jun 22, 2017

Could not linkify the URL with super-long TLDs. #601

Closed

Support links that point outside of the workspace (output pane) microsoft/vscode#23917

Closed

This was referenced Jul 6, 2017

Improve terminal link detection July plan item microsoft/vscode#30211

Closed

Support chrome-devtools scheme in web link matcher #599

Closed

Tyriar mentioned this issue Jul 28, 2017

Visual Studio Code's Terminal allows click in URL but truncates its string if comma is found microsoft/vscode#31460

Closed

Tyriar mentioned this issue Oct 24, 2017

extra characters are added to the links in the terminal microsoft/vscode#36806

Closed

chabou mentioned this issue Nov 28, 2017

Support hyperlink ansi escapes #1134

Closed

egmontkob mentioned this issue Nov 29, 2017

Linkifying includes trailing '.' #1135

Closed

Tyriar mentioned this issue Dec 20, 2017

Terminal doesn't create link to file if it's path contains non letter-or-digit symbols. microsoft/vscode#40560

Closed

Tyriar mentioned this issue Feb 1, 2018

Support triggering tooltip callback using customizable delay. #1251

Closed

Tyriar mentioned this issue Mar 16, 2018

Get multi-line links working #1303

Merged

Tyriar removed their assignment Jun 2, 2018

Tyriar mentioned this issue Jun 21, 2018

Terminal links - chrome devtools link isn't clickable microsoft/vscode#21668

Closed

This was referenced Jun 21, 2018

Terminal: link detection fails for UNC paths microsoft/vscode#25425

Closed

Regex not recognizing .services URL in terminal microsoft/vscode#29266

Closed

Tyriar mentioned this issue Jul 25, 2018

Clickable Links in Terminal ignore the port when clicked microsoft/vscode#54999

Closed

Tyriar mentioned this issue Mar 23, 2019

Parse ":line " in Quick Open microsoft/vscode#70726

Closed

Tyriar mentioned this issue Mar 30, 2019

Fixes issue #1954 #1957

Closed

Tyriar mentioned this issue Sep 10, 2019

Linkify variable values in repl; #79198 microsoft/vscode#80502

Merged

Tyriar added the area/links label Oct 7, 2019

Tyriar mentioned this issue Oct 9, 2019

Support Custom Link Matchers in Terminal via API microsoft/vscode#74849

Closed

Tyriar mentioned this issue Oct 23, 2019

Share link detection logic with the terminal and editor microsoft/vscode#83191

Closed

Goodwine mentioned this issue Oct 24, 2019

Linkify any URL even if it's an alias like http://foo/bar microsoft/vscode#83256

Closed

Tyriar mentioned this issue Oct 25, 2019

Linkify terminal paths without any path separators microsoft/vscode#22772

Closed

Tyriar mentioned this issue Oct 26, 2019

Support cell-by-cell navigation to allow links to be keyboard accessible and more fine grained keyboard selection microsoft/vscode#69795

Closed

jmbockhorst mentioned this issue Nov 1, 2019

Basic link parser for better link detection #2530

Merged

Tyriar mentioned this issue Jan 6, 2020

Relative Links are not clickable in Terminal microsoft/vscode#87892

Closed

Tyriar mentioned this issue Apr 2, 2020

Hyperlinks at the end of a sentence erroneously include the period #2811

Closed

Tyriar added this to the 4.6.0 milestone Apr 30, 2020

Tyriar closed this as completed Apr 30, 2020

jaraco mentioned this issue Dec 6, 2023

Hyperlinks at the end of a sentence erroneously include the period vercel/hyper#4444

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a proper parser for link detection #583

Use a proper parser for link detection #583

Tyriar commented Mar 3, 2017 •

edited

Loading

egmontkob commented Nov 28, 2017

Tyriar commented Mar 22, 2018

Tyriar commented Jun 2, 2018 •

edited

Loading

Tyriar commented Jun 21, 2018

JaneX8 commented Sep 30, 2019

Tyriar commented Sep 30, 2019

Tyriar commented Sep 30, 2019

Tyriar commented Oct 23, 2019

jmbockhorst commented Oct 24, 2019

jerch commented Oct 24, 2019 •

edited

Loading

Tyriar commented Oct 24, 2019

Tyriar commented Oct 25, 2019

mofux commented Nov 1, 2019

jerch commented Nov 1, 2019 •

edited

Loading

mofux commented Nov 1, 2019

jerch commented Nov 1, 2019 •

edited

Loading

Tyriar commented Nov 5, 2019 •

edited

Loading

jerch commented Nov 6, 2019 •

edited

Loading

Tyriar commented Apr 30, 2020

Use a proper parser for link detection #583

Use a proper parser for link detection #583

Comments

Tyriar commented Mar 3, 2017 • edited Loading

egmontkob commented Nov 28, 2017

Tyriar commented Mar 22, 2018

Tyriar commented Jun 2, 2018 • edited Loading

Tyriar commented Jun 21, 2018

JaneX8 commented Sep 30, 2019

Tyriar commented Sep 30, 2019

Tyriar commented Sep 30, 2019

Tyriar commented Oct 23, 2019

Current state

Proposal

Open questions

jmbockhorst commented Oct 24, 2019

jerch commented Oct 24, 2019 • edited Loading

Tyriar commented Oct 24, 2019

Tyriar commented Oct 25, 2019

mofux commented Nov 1, 2019

jerch commented Nov 1, 2019 • edited Loading

mofux commented Nov 1, 2019

jerch commented Nov 1, 2019 • edited Loading

Tyriar commented Nov 5, 2019 • edited Loading

jerch commented Nov 6, 2019 • edited Loading

Tyriar commented Apr 30, 2020

Tyriar commented Mar 3, 2017 •

edited

Loading

Tyriar commented Jun 2, 2018 •

edited

Loading

jerch commented Oct 24, 2019 •

edited

Loading

jerch commented Nov 1, 2019 •

edited

Loading

jerch commented Nov 1, 2019 •

edited

Loading

Tyriar commented Nov 5, 2019 •

edited

Loading

jerch commented Nov 6, 2019 •

edited

Loading