Skip to content

Commit

Permalink
Enable AVX512 Additional 16 SIMD Registers (#79544)
Browse files Browse the repository at this point in the history
* Change regMask_enum and regMaskTP to unsigned __int64_t on AMD64.

This allows for more registers to be encoded in the register allocator.

* Add upper 16 SIMD registers to allocator.

Commit includes refactoring code to use `const instrDesc *` instead of `instruction`
so information about when EVEX is needed (due to high SIMD registers) is
available to the emitter.

* Limit high SIMD reg to compatible intrinsics lsra build.

* Limit high SIMD reg to compatible intrinsics lsra build.

* Limit high SIMD reg to compatible intrinsics and gentree nodes.

Commit constrains certain hw intrinsics and gentree nodes to use
lower SIMD registers even if upper SIMD registers are available due
to limitations of EVEX encoding for certain instructions.

For example, SSE `Reciprocal` lowers to `rcpps` which does not have an
EVEX encoding form, hence, we cannot allow that hw intrincis node to use
a high SIMD register.

These intrinsics are marked with `HW_Flag_NoEvexSemantics`. Other such
intructions related to masking (typically marked with
`HW_Flag_ReturnsPerElementMask`) also have similar issues (though they
can be replaced with the EVEX k registers and associated masking when
implemented).

In addition, the callee/calleer save registers have also been adjusted
to properly handle the presence and absence of AVX512 upper simd
registers at runtime.

* Fix for X86 throughput.

* Add upper simd stress test to the AVX512 testing pipeline.

* Formatting.

* Fix wrong-sized attr for simd mov instruction.

* Fix non-AMD64 LSRA stress mask.

* Update src/coreclr/jit/compiler.h

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/compiler.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/gentree.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/hwintrinsic.h

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/target.h

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/emitxarch.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Remove unneeded vars

* Address PR comments.

* Allow `emitinl.h` access to the `rbm` variables.

* Replace RBM_LOWSIMD with `BuildEvexIncompatibleMask`.

* Move AVX512 dependent `targetamd.h` vars into compiler object.

* Fixing some edge cases for `targetamd.h` variables.

* Fix a merge/rebase bug.

* Update src/coreclr/jit/compiler.h

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/lsra.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/compiler.h

Co-authored-by: Bruce Forstall <[email protected]>

* Fix nits.

* Trying VM changes.

* VM hack.

* VM hack.

* Revert "VM hack."

This reverts commit 91cf3db.

* Adjust ACTUAL_REG_COUNT based on availability of AVX512.

* Use inline accessor functions instead of macros

Convert from macros to accessor functions for
RBM_ALLFLOAT, RBM_FLT_CALLEE_TRASH, CNT_CALLEE_TRASH_FLOAT.
Convert LSRA use of ACTUAL_REG_COUNT to AVAILABLE_REG_COUNT,
and create an accessor for that value for AMD64 as well.

* Clearifying comments.

---------

Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Kunal Pathak <[email protected]>
  • Loading branch information
3 people authored Feb 8, 2023
1 parent 1166bba commit ad52afd
Show file tree
Hide file tree
Showing 26 changed files with 1,098 additions and 525 deletions.
1 change: 1 addition & 0 deletions eng/pipelines/common/templates/runtimes/run-test-job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,7 @@ jobs:
${{ if in(parameters.testGroup, 'jitstress-isas-avx512') }}:
scenarios:
- jitstress_isas_avx512_forceevex
- jitstress_isas_avx512_forceevex_stresshighregs
${{ if in(parameters.testGroup, 'jitstressregs-x86') }}:
scenarios:
- jitstressregs1_x86_noavx
Expand Down
11 changes: 11 additions & 0 deletions src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,17 @@ class CodeGen final : public CodeGenInterface
GenTree* addr, bool fold, bool* revPtr, GenTree** rv1Ptr, GenTree** rv2Ptr, unsigned* mulPtr, ssize_t* cnsPtr);

private:
#if defined(TARGET_AMD64)
regMaskTP get_RBM_ALLFLOAT() const
{
return compiler->rbmAllFloat;
}
regMaskTP get_RBM_FLT_CALLEE_TRASH() const
{
return compiler->rbmFltCalleeTrash;
}
#endif // TARGET_AMD64

#if defined(TARGET_XARCH)
// Bit masks used in negating a float or double number.
// This is to avoid creating more than one data constant for these bitmasks when a
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3535,7 +3535,7 @@ void CodeGen::genStructPutArgUnroll(GenTreePutArgStk* putArgNode)
// this probably needs to be changed.

// Load
genCodeForLoadOffset(INS_movdqu, EA_8BYTE, xmmTmpReg, src, offset);
genCodeForLoadOffset(INS_movdqu, EA_16BYTE, xmmTmpReg, src, offset);
// Store
genStoreRegToStackArg(TYP_STRUCT, xmmTmpReg, offset);

Expand Down Expand Up @@ -8358,7 +8358,7 @@ void CodeGen::genStoreRegToStackArg(var_types type, regNumber srcReg, int offset
{
ins = INS_movdqu;
// This should be changed!
attr = EA_8BYTE;
attr = EA_16BYTE;
size = 16;
}
else
Expand Down
49 changes: 49 additions & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3329,6 +3329,24 @@ void Compiler::compInitOptions(JitFlags* jitFlags)
opts.compJitSaveFpLrWithCalleeSavedRegisters = JitConfig.JitSaveFpLrWithCalleeSavedRegisters();
}
#endif // defined(DEBUG) && defined(TARGET_ARM64)

#if defined(TARGET_AMD64)
rbmAllFloat = RBM_ALLFLOAT_INIT;
rbmFltCalleeTrash = RBM_FLT_CALLEE_TRASH_INIT;
cntCalleeTrashFloat = CNT_CALLEE_TRASH_FLOAT_INIT;
availableRegCount = ACTUAL_REG_COUNT;

if (DoJitStressEvexEncoding())
{
rbmAllFloat |= RBM_HIGHFLOAT;
rbmFltCalleeTrash |= RBM_HIGHFLOAT;
cntCalleeTrashFloat += CNT_CALLEE_TRASH_HIGHFLOAT;
}
else
{
availableRegCount -= CNT_HIGHFLOAT;
}
#endif // TARGET_AMD64
}

#ifdef DEBUG
Expand Down Expand Up @@ -3532,6 +3550,37 @@ bool Compiler::compPromoteFewerStructs(unsigned lclNum)
return rejectThisPromo;
}

//------------------------------------------------------------------------
// dumpRegMask: display a register mask. For well-known sets of registers, display a well-known token instead of
// a potentially large number of registers.
//
// Arguments:
// regs - The set of registers to display
//
void Compiler::dumpRegMask(regMaskTP regs) const
{
if (regs == RBM_ALLINT)
{
printf("[allInt]");
}
else if (regs == (RBM_ALLINT & ~RBM_FPBASE))
{
printf("[allIntButFP]");
}
else if (regs == RBM_ALLFLOAT)
{
printf("[allFloat]");
}
else if (regs == RBM_ALLDOUBLE)
{
printf("[allDouble]");
}
else
{
dspRegMask(regs);
}
}

#endif // DEBUG

void Compiler::compInitDebuggingInfo()
Expand Down
44 changes: 44 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -10453,6 +10453,8 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

bool compJitHaltMethod();

void dumpRegMask(regMaskTP regs) const;

#endif

/*
Expand Down Expand Up @@ -10727,6 +10729,48 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
GenTree* fgMorphMultiregStructArg(CallArg* arg);

bool killGCRefs(GenTree* tree);

#if defined(TARGET_AMD64)
private:
// The following are for initializing register allocator "constants" defined in targetamd64.h
// that now depend upon runtime ISA information, e.g., the presence of AVX512F/VL, which increases
// the number of SIMD (xmm, ymm, and zmm) registers from 16 to 32.
// As only 64-bit xarch has the capability to have the additional registers, we limit the changes
// to TARGET_AMD64 only.
//
// Users of these values need to define four accessor functions:
//
// regMaskTP get_RBM_ALLFLOAT();
// regMaskTP get_RBM_FLT_CALLEE_TRASH();
// unsigned get_CNT_CALLEE_TRASH_FLOAT();
// unsigned get_AVAILABLE_REG_COUNT();
//
// which return the values of these variables.
//
// This was done to avoid polluting all `targetXXX.h` macro definitions with a compiler parameter, where only
// TARGET_AMD64 requires one.
//
regMaskTP rbmAllFloat;
regMaskTP rbmFltCalleeTrash;
unsigned cntCalleeTrashFloat;
unsigned availableRegCount;

public:
regMaskTP get_RBM_ALLFLOAT() const
{
return rbmAllFloat;
}
regMaskTP get_RBM_FLT_CALLEE_TRASH() const
{
return rbmFltCalleeTrash;
}
unsigned get_CNT_CALLEE_TRASH_FLOAT() const
{
return cntCalleeTrashFloat;
}

#endif // TARGET_AMD64

}; // end of class Compiler

//---------------------------------------------------------------------------------------------------------------------
Expand Down
23 changes: 22 additions & 1 deletion src/coreclr/jit/emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,17 @@ void emitLocation::Print(LONG compMethodID) const
}
#endif // DEBUG

#if defined(TARGET_AMD64)
inline regMaskTP emitter::get_RBM_FLT_CALLEE_TRASH() const
{
return emitComp->rbmFltCalleeTrash;
}
inline unsigned emitter::get_AVAILABLE_REG_COUNT() const
{
return emitComp->availableRegCount;
}
#endif // TARGET_AMD64

/*****************************************************************************
*
* Return the name of an instruction format.
Expand Down Expand Up @@ -3226,11 +3237,19 @@ void emitter::emitDispRegSet(regMaskTP regs)

for (reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
{
if ((regs & genRegMask(reg)) == 0)
if (regs == RBM_NONE)
{
break;
}

regMaskTP curReg = genRegMask(reg);
if ((regs & curReg) == 0)
{
continue;
}

regs -= curReg;

if (sp)
{
printf(" ");
Expand Down Expand Up @@ -3400,6 +3419,7 @@ emitter::instrDesc* emitter::emitNewInstrCallInd(int argCnt,
#endif // TARGET_XARCH

/* Save the live GC registers in the unused register fields */
assert((gcrefRegs & RBM_CALLEE_TRASH) == 0);
emitEncodeCallGCregs(gcrefRegs, id);

return id;
Expand Down Expand Up @@ -3472,6 +3492,7 @@ emitter::instrDesc* emitter::emitNewInstrCallDir(int argCnt,
assert(!id->idIsLargeCns());

/* Save the live GC registers in the unused register fields */
assert((gcrefRegs & RBM_CALLEE_TRASH) == 0);
emitEncodeCallGCregs(gcrefRegs, id);

return id;
Expand Down
27 changes: 27 additions & 0 deletions src/coreclr/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -1138,6 +1138,28 @@ class emitter
idAddr()->_idReg4 = reg;
assert(reg == idAddr()->_idReg4);
}
bool idHasReg3() const
{
switch (idInsFmt())
{
case IF_RWR_RRD_RRD:
case IF_RWR_RRD_RRD_CNS:
case IF_RWR_RRD_RRD_RRD:
return true;
default:
return false;
}
}
bool idHasReg4() const
{
switch (idInsFmt())
{
case IF_RWR_RRD_RRD_RRD:
return true;
default:
return false;
}
}
#endif // defined(TARGET_XARCH)
#ifdef TARGET_ARMARCH
insOpts idInsOpt() const
Expand Down Expand Up @@ -1968,6 +1990,11 @@ class emitter
CORINFO_FIELD_HANDLE emitBlkConst(const void* cnsAddr, unsigned cnsSize, unsigned cnsAlign, var_types elemType);

private:
#if defined(TARGET_AMD64)
regMaskTP get_RBM_FLT_CALLEE_TRASH() const;
unsigned get_AVAILABLE_REG_COUNT() const;
#endif // TARGET_AMD64

CORINFO_FIELD_HANDLE emitFltOrDblConst(double constValue, emitAttr attr);
CORINFO_FIELD_HANDLE emitSimd8Const(simd8_t constValue);
CORINFO_FIELD_HANDLE emitSimd16Const(simd16_t constValue);
Expand Down
3 changes: 0 additions & 3 deletions src/coreclr/jit/emitinl.h
Original file line number Diff line number Diff line change
Expand Up @@ -211,11 +211,8 @@ inline ssize_t emitter::emitGetInsAmdAny(instrDesc* id)
*
* Convert between a register mask and a smaller version for storage.
*/

/*static*/ inline void emitter::emitEncodeCallGCregs(regMaskTP regmask, instrDesc* id)
{
assert((regmask & RBM_CALLEE_TRASH) == 0);

unsigned encodeMask;

#ifdef TARGET_X86
Expand Down
Loading

0 comments on commit ad52afd

Please sign in to comment.