[release/9.0] JIT: Null out SSA def nodes upon removal in RBO #108548
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #108530 to release/9.0
/cc @amanasifkhalid
Customer Impact
Various JIT optimizations can remove stores to local variables. If the JIT removes a store to a variable that's tracked by SSA, the removed store node might be tracked by an SSA definition, which needs to be invalidated; otherwise, we run the risk of referencing invalid IR later on. When optimizing redundant branches, the JIT may remove a statement containing a store node after SSA data structures have been initialized, so we must take care to maintain them here.
Regression
As far as I can tell, the lack of SSA definition maintenance has been around for several releases, though recent expansion of SSA-based optimizations may have revealed this.
Testing
The issue of referencing invalid SSA definition nodes was exposed by a test case generated by one of our fuzzing tools. Upon further inspection, our existing test suites have been referencing invalid IR nodes during SSA-based opts, but the overwhelming majority of the time, the JIT's heuristics bail for these nodes, hence why we didn't hit this failure earlier. I've added some debug logic during store node removal in redundant branch opts to overwrite the removed node with garbage values, so future attempts to read removed nodes will trigger asserts.
Risk
Low, based on the fact that it took us this long to find a code shape that exposes this failure. The fix is simple: We null the pointer to the removed store node in its corresponding SSA definition node. Existing call sites that use SSA definitions are already expected to check for null IR nodes to handle cases where the store is removed, so this fix is unlikely to regress JIT behavior elsewhere.