Limit amount of tokens used to calculate LCS distance #68151

tmat · 2023-05-10T00:47:52Z

Reduces the amount of buffers allocated on LOH during calculation of EnC/Hot Reload delta.

tmat · 2023-05-10T00:48:01Z

CyrusNajmabadi · 2023-05-10T00:51:16Z

Not sure how these ar stored today. IN the future, woudl using SegmentedList be appropriate, if the goal is to avoid the LOH?

tmat · 2023-05-10T01:18:57Z

We only need to avoid LOH for segments that are not pooled as those will become garbage once he calculation is finished. The pooled ones will be reused.

davidwengier

I have 0 confidence that I understand what is going on here. or how it works. eg. I'm guessing this doesn't mean that we simply throw away tokens/character that are beyond the maximum, but it sure looks that way!

tmat · 2023-05-10T15:45:27Z

I'm guessing this doesn't mean that we simply throw away tokens/character that are beyond the maximum, but it sure looks that way!

Yes, we do throw them away but only when we are comparing similarity of sequences of tokens ("weights" or "distances"). Not when calculating actual edits.
E.g. when we have a method body that has multiple if blocks each with 100s of statements (which is real code customers have) the tree diffing algorithm is trying to figure out which if blocks in the old version match if blocks in the new version. For that we calculate a "distance" of sequences of tokens comprising the if blocks using the LCS algorithm. The alg needs O(n^2) memory, which can easily run into 10s of MBs.

This change limits the number of tokens we look at when calculating the distance. If first ~900 tokens are the same we consider the sequence exactly the same.

CyrusNajmabadi · 2023-05-10T18:01:41Z

This change limits the number of tokens we look at when calculating the distance. If first ~900 tokens are the same we consider the sequence exactly the same.

What's the fallout of this though. Would that mean we miss an actual change, if not within that token count? Say someone edits the last statement?

(Just trying to figure out the implication of assuming a match). Thanks!

tmat · 2023-05-10T18:31:37Z

@CyrusNajmabadi Fallout is decreased precision of matching heuristic. We would still detect any change made anywhere in the tree.

Say you have blocks A, B, C and the user makes a change that reorders B and C, so the new version is A, C, B. If C and B are large and only differ in the last token we would produce matches: A1 ~ A2, B1 ~ C2, C1 ~ B2, instead of more correct one:
A1 ~ A2, B1 ~ B2, C1 ~ C2.

This matters for example, if B contains a lambda, a local variable definition, etc. Then we would try to match the lambda/variable/etc. in B1 to the corresponding lambda in C2 instead of B2.

It seems unlikely though that this would happen often in practice. Reordering large blocks of code is not very common during EnC. Besides the blocks need to have the same 900 leading tokens in order for the matching alg to be affected.

CyrusNajmabadi · 2023-05-10T18:44:37Z

@CyrusNajmabadi Fallout is decreased precision of matching heuristic. We would still detect any change made anywhere in the tree.

Cool, that's what i wanted to know. If this is just about a heuristic that says "more likely than not this is a move, given that 900 tokesn match" that seems totally reasonable to me.

davidwengier · 2023-05-10T21:29:55Z

Thank you for that explanation @tmat, makes sense.

Limit amount of tokens used to calculate LCS distance

fc244d5

tmat requested a review from a team as a code owner May 10, 2023 00:47

dotnet-issue-labeler bot added Area-Interactive untriaged Issues and PRs which have not yet been triaged by a lead labels May 10, 2023

davidwengier approved these changes May 10, 2023

View reviewed changes

tmat added 2 commits May 10, 2023 17:17

Fix

60cc0c5

Fix

07fb972

tmat merged commit b404fbe into dotnet:main May 11, 2023

tmat deleted the LcsDistance branch May 11, 2023 15:51

ghost added this to the Next milestone May 11, 2023

allisonchou mentioned this pull request May 12, 2023

[Automated] PRs inserted in VS build main-33712.304 #68184

Closed

Cosifne modified the milestones: Next, 17.7 P2 May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit amount of tokens used to calculate LCS distance #68151

Limit amount of tokens used to calculate LCS distance #68151

tmat commented May 10, 2023 •

edited by Pilchie

Loading

tmat commented May 10, 2023

CyrusNajmabadi commented May 10, 2023

tmat commented May 10, 2023 •

edited

Loading

davidwengier left a comment

tmat commented May 10, 2023

CyrusNajmabadi commented May 10, 2023

tmat commented May 10, 2023

CyrusNajmabadi commented May 10, 2023

davidwengier commented May 10, 2023

Limit amount of tokens used to calculate LCS distance #68151

Limit amount of tokens used to calculate LCS distance #68151

Conversation

tmat commented May 10, 2023 • edited by Pilchie Loading

tmat commented May 10, 2023

CyrusNajmabadi commented May 10, 2023

tmat commented May 10, 2023 • edited Loading

davidwengier left a comment

Choose a reason for hiding this comment

tmat commented May 10, 2023

CyrusNajmabadi commented May 10, 2023

tmat commented May 10, 2023

CyrusNajmabadi commented May 10, 2023

davidwengier commented May 10, 2023

tmat commented May 10, 2023 •

edited by Pilchie

Loading

tmat commented May 10, 2023 •

edited

Loading