Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize simple arithmetic with GT_NEG #13837

Closed
EgorBo opened this issue Dec 7, 2019 · 5 comments · Fixed by #43921
Closed

JIT: Optimize simple arithmetic with GT_NEG #13837

EgorBo opened this issue Dec 7, 2019 · 5 comments · Fixed by #43921
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Dec 7, 2019

I was surprised it's not handled in morph, e.g. -a + b => b - a.

int M1(int a, int b) => -a +  b; // optimize to  "b - a"
int M2(int a, int b) =>  a + -b; // optimize to  "a - b"
int M3(int a, int b) =>  a - -b; // optimize to  "a + b"
int M4(int a, int b) => -a - -b; // optimize to  "b - a"

Current codegen:

; Method CC:M1(int,int):int:this
G_M48868_IG02:
       mov      eax, edx
       neg      eax
       add      eax, r8d

; Method CC:M2(int,int):int:this
G_M15271_IG02:
       mov      eax, r8d
       neg      eax
       add      eax, edx

; Method CC:M3(int,int):int:this
G_M13857_IG02:
       mov      eax, r8d
       neg      eax
       sub      edx, eax
       mov      eax, edx

; Method CC:M4(int,int):int:this
G_M28384_IG02:
       mov      eax, edx
       neg      eax
       mov      edx, r8d
       neg      edx
       sub      eax, edx

Expected codegen: https://godbolt.org/z/nmZxv8

Also:

int M5(int a) =>  -a / 10;  // optimize to  "a / -10"
int M6(int a) => -(a / 10); // optimize to  "a / -10"
int M7(int a) =>  -a * 10;  // optimize to  "a * -10"
int M8(int a) => -(a * 10); // optimize to  "a * -10"

category:cq
theme:basic-cq
skill-level:beginner
cost:small

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@JulieLeeMSFT JulieLeeMSFT self-assigned this Aug 3, 2020
@JulieLeeMSFT
Copy link
Member

With the help with Eugene, I was able to get optimized code for M1 to M4 as shown in https://godbolt.org/z/nmZxv8.
Found some issues during jit-diff that does not handle cases where exception is thrown from both operands, and investigation WIP.

@erozenfeld
Copy link
Member

Note that currently morph does the opposite transformation when one of the operands is a const:

case GT_SUB:
if (tree->gtOverflow())
{
goto CM_OVF_OP;
}
// TODO #4104: there are a lot of other places where
// this condition is not checked before transformations.
if (fgGlobalMorph)
{
/* Check for "op1 - cns2" , we change it to "op1 + (-cns2)" */
noway_assert(op2);
if (op2->IsCnsIntOrI())
{
/* Negate the constant and change the node to be "+" */
op2->AsIntConCommon()->SetIconValue(-op2->AsIntConCommon()->IconValue());
op2->AsIntConRef().gtFieldSeq = FieldSeqStore::NotAField();
oper = GT_ADD;
tree->ChangeOper(oper);
goto CM_ADD_OP;
}
/* Check for "cns1 - op2" , we change it to "(cns1 + (-op2))" */
noway_assert(op1);
if (op1->IsCnsIntOrI())
{
noway_assert(varTypeIsIntOrI(tree));
// The type of the new GT_NEG node cannot just be op2->TypeGet().
// Otherwise we may sign-extend incorrectly in cases where the GT_NEG
// node ends up feeding directly into a cast, for example in
// GT_CAST<ubyte>(GT_SUB(0, s_1.ubyte))
tree->AsOp()->gtOp2 = op2 = gtNewOperNode(GT_NEG, genActualType(op2->TypeGet()), op2);
fgMorphTreeDone(op2);
oper = GT_ADD;
tree->ChangeOper(oper);
goto CM_ADD_OP;
}
/* No match - exit */
}
break;

op1 - cns2 => op1 + (-cns2)
cns1 - op2 => cns1 + (-op2)

The first transformation is fine, since we are negating a const.

The second transformation is then continued

cns1 - op2 => cns1 + (-op2) => (-op2) + cns1

to canonicalize the location of the const.
Since constants are canonicalized as second operands, transformations like

((x+icon1)+(y+icon2)) => ((x+y)+(icon1+icon2))

and

((x+icon1)+icon2) => (x+(icon1+icon2))

are simplified.

@JulieLeeMSFT I think you should just disable your optimization when the operand that's not negated is a const.

@JulieLeeMSFT
Copy link
Member

That seems the best solution. We will skip transforming NEG when non negated operand is constant. Will check if bitwise shift pattern needs to be revisited where NEG was replaced with SUB pattern.

@JulieLeeMSFT
Copy link
Member

Simple ADD and SUB arithmetic optimization has been completed with no regression.

  • We skipped optimizing patterns when non-NEG operand is constant as discussed above.
  • Many patterns in the jit-diff --frameworks test have constant operands, so code size reduction is not big.
  • NEG operator is more expensive (e.g., 5-6 CPU cycles) than SUB operator (e.g., 1 cycle) in terms of latency, so it shows better perf scores when optimized.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -50 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
         -13 : System.Private.CoreLib.dasm (-0.00% of base)
         -12 : Microsoft.VisualBasic.Core.dasm (-0.00% of base)
         -12 : System.Runtime.Numerics.dasm (-0.02% of base)
          -8 : System.Private.Xml.dasm (-0.00% of base)
          -4 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
          -1 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
6 total files with Code Size differences (6 improved, 0 regressed), 259 unchanged.
Top method improvements (bytes):
         -12 (-2.11% of base) : Microsoft.VisualBasic.Core.dasm - Financial:NPer(double,double,double,double,int):double
         -12 (-7.50% of base) : System.Runtime.Numerics.dasm - Complex:op_Division(Complex,Complex):Complex
          -6 (-1.46% of base) : System.Private.Xml.dasm - XmlSqlBinaryReader:FillAllowEOF():bool:this
          -5 (-0.70% of base) : System.Private.CoreLib.dasm - Number:NumberToFloatingPointBitsSlow(byref,byref,int,int,int):long
          -4 (-1.02% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceConstructorSymbol:CalculateLocalSyntaxOffset(int,SyntaxTree):int:this
          -4 (-1.94% of base) : System.Private.CoreLib.dasm - ScopeTree:AddScopeInfo(byte,int):this
          -2 (-1.55% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunCounted(byref,int,Span`1,byref,byref):bool
          -2 (-1.05% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunShortest(byref,byref,byref,Span`1,byref,byref):bool
          -2 (-0.19% of base) : System.Private.Xml.dasm - FloatingDecimal:AdjustDbl(double):double:this
          -1 (-0.20% of base) : Microsoft.CodeAnalysis.dasm - CustomDebugInfoWriter:SerializeReferenceToIteratorClass(String,ArrayBuilder`1)
Top method improvements (percentages):
         -12 (-7.50% of base) : System.Runtime.Numerics.dasm - Complex:op_Division(Complex,Complex):Complex
         -12 (-2.11% of base) : Microsoft.VisualBasic.Core.dasm - Financial:NPer(double,double,double,double,int):double
          -4 (-1.94% of base) : System.Private.CoreLib.dasm - ScopeTree:AddScopeInfo(byte,int):this
          -2 (-1.55% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunCounted(byref,int,Span`1,byref,byref):bool
          -6 (-1.46% of base) : System.Private.Xml.dasm - XmlSqlBinaryReader:FillAllowEOF():bool:this
          -2 (-1.05% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunShortest(byref,byref,byref,Span`1,byref,byref):bool
          -4 (-1.02% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceConstructorSymbol:CalculateLocalSyntaxOffset(int,SyntaxTree):int:this
          -5 (-0.70% of base) : System.Private.CoreLib.dasm - Number:NumberToFloatingPointBitsSlow(byref,byref,int,int,int):long
          -1 (-0.20% of base) : Microsoft.CodeAnalysis.dasm - CustomDebugInfoWriter:SerializeReferenceToIteratorClass(String,ArrayBuilder`1)
          -2 (-0.19% of base) : System.Private.Xml.dasm - FloatingDecimal:AdjustDbl(double):double:this
10 total methods with Code Size differences (10 improved, 0 regressed), 244443 unchanged.

Next, will look into MUL and DIV operator optimization.

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@JulieLeeMSFT JulieLeeMSFT removed the JitUntriaged CLR JIT issues needing additional triage label Nov 3, 2020
@JulieLeeMSFT JulieLeeMSFT modified the milestones: Future, 6.0.0 Nov 3, 2020
JulieLeeMSFT added a commit that referenced this issue Nov 6, 2020
* Simple arithmetic optimization with GT_NEG

* Skip GT_NEG optimization when an operand is constant. Revert bitwise rotation pattern

* Fixed Value Numbering assert

* Cleaned up code and comments for simple GT_NEG optimization

* Formatting

Co-authored-by: Julie Lee <[email protected]>
@JulieLeeMSFT
Copy link
Member

Completed optimizing + and - operators. Will open a new issue to track * and /.

tqiu8 pushed a commit to tqiu8/runtime that referenced this issue Nov 9, 2020
author Stephen Toub <[email protected]> 1604601164 -0500
committer Tammy Qiu <[email protected]> 1604960878 -0500

Add stream conformance tests for TranscodingStream (dotnet#44248)

* Add stream conformance tests for TranscodingStream

* Special-case 0-length input buffers to TranscodingStream.Write{Async}

The base implementation of Encoder.Convert doesn't like empty inputs.  Regardless, if the input is empty, we can avoid a whole bunch of unnecessary work.

JIT: minor inliner refactoring (dotnet#44215)

Extract out the budget check logic so it can vary by inlining policy.
Use this to exempt the FullPolicy from budget checking.

Fix inline xml to dump the proper (full name) hash for inlinees.

Update range dumper to dump ranges in hex.

Remove unused QCall for WinRTSupported (dotnet#44278)

ConcurrentQueueSegment allows spinning threads to sleep. (dotnet#44265)

* Allow threads to sleep when ConcurrentQueue has many enqueuers/dequeuers.

* Update src/libraries/System.Private.CoreLib/src/System/Collections/Concurrent/ConcurrentQueueSegment.cs

Co-authored-by: Stephen Toub <[email protected]>

* Apply suggestions from code review

Co-authored-by: Stephen Toub <[email protected]>

Co-authored-by: AMD DAYTONA EPYC <[email protected]>
Co-authored-by: Stephen Toub <[email protected]>

File.Exists() is not null when true (dotnet#44310)

* File.Exists() is not null when true

* Fix compile

* Fix compile 2

[master][watchOS] Add simwatch64 support (dotnet#44303)

Xcode 12.2 removed 32 bits support for watchOS simulators, this PR helps to fix xamarin/xamarin-macios#9949, we have tested the new binaries and they are working as expected

![unknown](https://user-images.githubusercontent.com/204671/98253709-64413200-1f49-11eb-9774-8c5aa416fc57.png)

Co-authored-by: dalexsoto <[email protected]>

Implementing support to Debugger::Break. (dotnet#44305)

Set fgOptimizedFinally flag correctly (dotnet#44268)

- Initialize to 0 at compiler startup
- Set flag when finally cloning optimization kicks in

Fixes non-deterministic generation of nop opcodes into ARM32 code

Forbid `- byref cnst` -> `+ (byref -cnst)` transformation. (dotnet#44266)

* Add a repro test.

* Forbid the transformation for byrefs.

* Update src/coreclr/src/jit/morph.cpp

Co-authored-by: Andy Ayers <[email protected]>

* Update src/coreclr/src/jit/morph.cpp

* Fix the test return value.

WriteLine is just to make sure we don't delete the value.

* improve the test.

avoid a possible overflow and don't waste time on printing.

Co-authored-by: Andy Ayers <[email protected]>

Pick libmonosgen-2.0.so from cmake install directory instead of .libs (dotnet#44291)

This aligns Linux with what we already do for all the other platforms.

Update SharedPerformanceCounter assert (dotnet#44333)

Remove silly ToString in GetCLRInstanceString (dotnet#44335)

Use targetPlatformMoniker for net5.0 and newer tfms (dotnet#43965)

* Use targetPlatformMoniker for net5.0 and newer tfms

* disabling analyzer, update version to 0.0, and use new format.

* update the targetFramework.sdk

* removing supportedOS assembly level attribute

* fix linker errors and addressing feedback

* making _TargetFrameworkWithoutPlatform as private

[sgen] Add Ward annotations to sgen_get_total_allocated_bytes (dotnet#43833)

Attempt to fix https://jenkins.mono-project.com/job/test-mono-mainline-staticanalysis/

Co-authored-by: lambdageek <[email protected]>

[tests] Re-enable tests fixed by dotnet#44081 (dotnet#44212)

Fixes
mono/mono#15030 and
fixes mono/mono#15031 and
fixes mono/mono#15032

Add an implicit argument coercion check. (dotnet#43386)

* Add `impCheckImplicitArgumentCoercion`.

* Fix tests with type mismatch.

* Try to fix VM signature.

* Allow to pass byref as native int.

* another fix.

* Fix another IL test.

[mono] Change CMakelists.txt "python" -> Python3_EXECUTABLE (dotnet#44340)

Debian doesn't install a "python" binary for python3.

Tweak StreamConformanceTests for cancellation (dotnet#44342)

- Avoid unnecessary timers
- Separate tests for precancellation, ReadAsync(byte[], ...) cancellation, and ReadAsync(Memory, ...) cancellation

Use Dictionary for underlying cache of ResourceSet (dotnet#44104)

Simplify catch-rethrow logic in NetworkStream (dotnet#44246)

A follow-up on dotnet#40772 (comment), simplifies and harmonizes the way we wrap exceptions into IOException. Having one catch block working with System.Exception seems to be enough here, no need for specific handling of SocketException.

Simple GT_NEG optimization for dotnet#13837 (dotnet#43921)

* Simple arithmetic optimization with GT_NEG

* Skip GT_NEG optimization when an operand is constant. Revert bitwise rotation pattern

* Fixed Value Numbering assert

* Cleaned up code and comments for simple GT_NEG optimization

* Formatting

Co-authored-by: Julie Lee <[email protected]>

[master] Update dependencies from mono/linker (dotnet#44322)

* Update dependencies from https://github.com/mono/linker build 20201105.1

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20527.2 -> To Version 6.0.0-alpha.1.20555.1

* Update dependencies from https://github.com/mono/linker build 20201105.2

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20527.2 -> To Version 6.0.0-alpha.1.20555.2

* Disable new optimization for libraries mode (it cannot work in this mode)

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Marek Safar <[email protected]>

Tighten argument validation in StreamConformanceTests (dotnet#44326)

Add threshold on number of files / partition in SPMI collection (dotnet#44180)

* Add check for files count

* Fix the OS check

* decrese file limit to 1500:

* misc fix

* Do not upload to azure if mch files are zero size

Fix ELT profiler tests (dotnet#44285)

[master] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/icu (dotnet#44336)

[master] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/icu

 - Merge branch 'master' into darc-master-2211df94-2a02-4c3c-abe1-e3534e896267

Fix Send_TimeoutResponseContent_Throws (dotnet#44356)

If the client times out too quickly, the server may never have a connection to accept and will hang forever.

Match CoreCLR behaviour on thread start failure (dotnet#44124)

Co-authored-by: Aleksey Kliger (λgeek) <[email protected]>

Add slash in Windows SoD tool build (dotnet#44359)

* Add slash in Windows SoD tool build

* Update SoD search path to match output dir

* Fixup dotnet version

* Remove merge commit headers

* Disable PRs

Co-authored-by: Drew Scoggins <andrew.g.scoggins@gmail>

Reflect test path changes in .gitattributes; remove nonexistent files (dotnet#44371)

Bootstrapping a test for R2RDump (dotnet#42150)

Improve performance of Enum's generic IsDefined / GetName / GetNames (dotnet#44355)

Eliminates the boxing in IsDefined/GetName/GetValues, and in GetNames avoids having to go through RuntimeType's GetEnumNames override.

clarify http version test (dotnet#44379)

Co-authored-by: Geoffrey Kizer <[email protected]>

Update dependencies from https://github.com/mono/linker build 20201106.1 (dotnet#44367)

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20555.2 -> To Version 6.0.0-alpha.1.20556.1

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

Disable RunThreadLocalTest8_Values on Mono (dotnet#44357)

* Disable RunThreadLocalTest8_Values on Mono

It's failing on SLES

* fix typo

LongProcessNamesAreSupported: make test work on distros where sleep is a symlink/script (dotnet#44299)

* LongProcessNamesAreSupported: make test work on distros where sleep is a symlink/script

* PR feedback

Co-authored-by: Stephen Toub <[email protected]>

* fix compilation

Co-authored-by: Stephen Toub <[email protected]>

add missing constructor overloads (dotnet#44380)

Co-authored-by: Geoffrey Kizer <[email protected]>

change using in ConnectCallback_UseUnixDomainSocket_Success (dotnet#44366)

Clean up the samples (dotnet#44293)

Update dotnet/roslyn issue link

Delete stale comment about dotnet/roslyn#30797

Fix/remove TODO-NULLABLEs (dotnet#44300)

* Fix/remove TODO-NULLABLEs

* remove redundant !

* apply Jozkee's feedback

* address feedback

Update glossary (dotnet#44274)

Co-authored-by: Juan Hoyos <[email protected]>
Co-authored-by: Stephen Toub <[email protected]>
Co-authored-by: Günther Foidl <[email protected]>

Add files need for wasm executable relinking/aot to the wasm runtime pack. (dotnet#43785)

Co-authored-by: Alexander Köplinger <[email protected]>

Move some more UnmanagedCallersOnly tests to IL now that they're invalid C# (dotnet#43366)

Fix C++ build for mono/metadata/threads.c (dotnet#44413)

`throw` is a reserved keyword in C++.

Disable a failing test. (dotnet#44404)

Change async void System.Text.Json test to be async Task (dotnet#44418)

Improve crossgen2 comparison jobs (dotnet#44119)

- Fix compilation on unix platforms
  - Wrap use of wildcard in quotes
- Print better display name into log
- Fix X86 constant comparison handling
- Add ability to compile specific overload via single method switches

Remove some unnecessary GetTypeInfo usage (dotnet#44414)

Fix MarshalTypedArrayByte and re-enable it. Re-enable TestFunctionApply
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
5 participants