Skip FP registers processing if method don't have floating points use #85744

kunalspathak · 2023-05-04T01:19:43Z

Fixes: #83109
Contributes to #83946

ghost · 2023-05-04T01:21:14Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	kunalspathak
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

kunalspathak · 2023-05-04T01:28:49Z

src/coreclr/jit/lsra.h

@@ -733,6 +733,7 @@ class LinearScan : public LinearScanInterface
    unsigned int currentSpill[TYP_COUNT];
    bool         needFloatTmpForFPCall;
    bool         needDoubleTmpForFPCall;
+    bool         needFloatingRegisters;


A safe change would be to have needFloatingRegisters |= (theRegisterType != IntRegisterType); in newInterval() method but want to see if my current changes achieve the purpose.

kunalspathak · 2023-05-04T02:56:18Z

Nice TP Diffs

kunalspathak · 2023-05-04T02:57:15Z

/azp run runtime-coreclr jitstress-jitstressregs, runtime-coreclr jitstress

azure-pipelines · 2023-05-04T02:57:31Z

Azure Pipelines successfully started running 1 pipeline(s).

src/coreclr/jit/lsra.cpp

tannergooding · 2023-05-04T05:36:06Z

Nice TP Diffs

Do we know what causes the potential hit to Arm64 minopts? Seems odd that we see such huge improvements elsewhere, but not always in minopts but only on Arm64

I see tests in particular are the "worst offender", is that maybe due to the number of HWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

kunalspathak · 2023-05-04T05:52:13Z

Jitstress is green.

kunalspathak · 2023-05-04T05:53:15Z

I see tests in particular are the "worst offender", is that maybe due to the number of HWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

I will try to find it out.

kunalspathak · 2023-05-04T05:53:33Z

@dotnet/jit-contrib

kunalspathak · 2023-05-05T04:10:09Z

I see tests in particular are the "worst offender", is that maybe due to the number of HWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

I will try to find it out.

PS E:\spmi\pintools\1.0\windows> .\analyze-pin-trace-diff.ps1 -b E:\features\lsra\floatregs\base_arm64_coreclr.txt E:\features\lsra\floatregs\diff_arm64_coreclr.txt
Base: 431066232654, Diff: 431355053761, +0.0670%

?buildIntervals@LinearScan@@QEAAXXZ                                                                                           : 571725277  : +46.55%  : 27.20% : +0.1326%
?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z                                                               : 349854912  : +27.75%  : 16.64% : +0.0812%
?resolveRegisters@LinearScan@@QEAAXXZ                                                                                         : 270767431  : +3.38%   : 12.88% : +0.0628%
?impImportAndPushBox@Compiler@@IEAAXPEAUCORINFO_RESOLVED_TOKEN@@@Z                                                            : 2230836    : +6.41%   : 0.11%  : +0.0005%
?impPushOnStack@Compiler@@IEAAXPEAUGenTree@@VtypeInfo@@@Z                                                                     : -2256082   : -0.75%   : 0.11%  : -0.0005%
?impImportCall@Compiler@@IEAA?AW4var_types@@W4opcode_t@@PEAUCORINFO_RESOLVED_TOKEN@@1PEAUGenTree@@HPEAUCORINFO_CALL_INFO@@I@Z : -2438804   : -0.20%   : 0.12%  : -0.0006%
??$allocateRegisters@$0A@@LinearScan@@QEAAXXZ                                                                                 : -436643544 : -1.57%   : 20.77% : -0.1013%
?buildPhysRegRecords@LinearScan@@AEAAXXZ                                                                                      : -464797849 : -100.00% : 22.11% : -0.1078%

I investigated the top methods and one thing that stands out is that earlier the for-loop of registers was unrolled and now it no longer does, but iterating it fewer times is still better.

The call to buildPhysRegsRecord() is now getting inlined inside buildIntervals().

Other then that, the only extra thing I am doing is setting needNonIntegerRegisters = true inside internalFloatRegCandidates().

kunalspathak · 2023-05-05T05:43:45Z

It also occurred to me that for xarch, we already updated the definition of AVAILABLE_REG_COUNT to a getter, but for arm64, it was a constant. This PR makes arm64 definition to getter as well and that could reflect the regression on arm64 and not on x64.

tannergooding · 2023-05-05T14:43:34Z

src/coreclr/jit/lsrabuild.cpp

+    // For that `compFloatingPointUsed` should be set accurately
+    // before invoking allocator.


I think this bit in particular is a bit difficult as we need to pre-emptively set compFloatingPointUsed for any SIMD code. But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

Would it instead be possible to have regMaskTP LinearScan::internalFloatRegCandidates() lazily do the below, or is there other factors that require it?

That is, have it do something like:

regMaskTP LinearScan::internalFloatRegCandidates() { if (!needNonIntegerRegisters) { // lazily initialize fpreg state here for (unsigned int i = 0; i < lsraRegOrderFltSize; i++) { regNumber reg = lsraRegOrderFlt[i]; RegRecord* curr = &physRegs[reg]; curr->regOrder = (unsigned char)i; } needNonIntegerRegisters = true; } if (compiler->compFloatingPointUsed) { return availableFloatRegs; } else { return RBM_FLT_CALLEE_TRASH; } }

Unfortunately we need the physRegRecord early right after identifying candidates.

But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

True, but this is the best we can do I suppose, unless you are proposing to just start from scratch in LSRA and set it to true during BuildNode() if tree->GetType() != TYP_INT.

Edit: Doing the check in BuildNode() might prove costly though because we will be doing that for every single GenTree* node and not sure how many cases we will optimize for the scenarios where we didn't account for dead code elimination.

skip processing floating point registers if method doesn't have one

fe09c7f

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 4, 2023

ghost assigned kunalspathak May 4, 2023

kunalspathak commented May 4, 2023

View reviewed changes

kunalspathak changed the title ~~Skip processing floating point registers if method doesn't have one~~ Skip FP registers processing if method don't have floating points use May 4, 2023

tannergooding reviewed May 4, 2023

View reviewed changes

src/coreclr/jit/lsra.cpp Outdated Show resolved Hide resolved

kunalspathak mentioned this pull request May 4, 2023

AVX-512 throughput improvement opportunties #83946

Open

Rename to needNonIntegerRegisters

8a7d379

kunalspathak marked this pull request as ready for review May 4, 2023 05:53

build-analysis bot mentioned this pull request May 4, 2023

Unable to load Analyzer assembly .../Microsoft.CodeAnalysis.Analyzers.dll : Not a valid assembly #85082

Closed

Merge remote-tracking branch 'origin/main' into lsra-throughput

e295312

tannergooding reviewed May 5, 2023

View reviewed changes

tannergooding approved these changes May 5, 2023

View reviewed changes

jit format

de985d6

BruceForstall approved these changes May 6, 2023

View reviewed changes

kunalspathak merged commit 8854509 into dotnet:main May 6, 2023

kunalspathak deleted the lsra-throughput branch May 6, 2023 03:37

ghost locked as resolved and limited conversation to collaborators Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip FP registers processing if method don't have floating points use #85744

Skip FP registers processing if method don't have floating points use #85744

kunalspathak commented May 4, 2023 •

edited

Loading

ghost commented May 4, 2023

kunalspathak May 4, 2023

kunalspathak commented May 4, 2023

kunalspathak commented May 4, 2023

azure-pipelines bot commented May 4, 2023

tannergooding commented May 4, 2023 •

edited

Loading

kunalspathak commented May 4, 2023

kunalspathak commented May 4, 2023

kunalspathak commented May 4, 2023

kunalspathak commented May 5, 2023

kunalspathak commented May 5, 2023

tannergooding May 5, 2023

tannergooding May 5, 2023 •

edited

Loading

kunalspathak May 5, 2023

kunalspathak May 5, 2023 •

edited

Loading

		// For that `compFloatingPointUsed` should be set accurately
		// before invoking allocator.

Skip FP registers processing if method don't have floating points use #85744

Skip FP registers processing if method don't have floating points use #85744

Conversation

kunalspathak commented May 4, 2023 • edited Loading

ghost commented May 4, 2023

kunalspathak May 4, 2023

Choose a reason for hiding this comment

kunalspathak commented May 4, 2023

kunalspathak commented May 4, 2023

azure-pipelines bot commented May 4, 2023

tannergooding commented May 4, 2023 • edited Loading

kunalspathak commented May 4, 2023

kunalspathak commented May 4, 2023

kunalspathak commented May 4, 2023

kunalspathak commented May 5, 2023

kunalspathak commented May 5, 2023

tannergooding May 5, 2023

Choose a reason for hiding this comment

tannergooding May 5, 2023 • edited Loading

Choose a reason for hiding this comment

kunalspathak May 5, 2023

Choose a reason for hiding this comment

kunalspathak May 5, 2023 • edited Loading

Choose a reason for hiding this comment

kunalspathak commented May 4, 2023 •

edited

Loading

tannergooding commented May 4, 2023 •

edited

Loading

tannergooding May 5, 2023 •

edited

Loading

kunalspathak May 5, 2023 •

edited

Loading