Skip to content

Commit

Permalink
Fix FMA4 detection (#262)
Browse files Browse the repository at this point in the history
FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

The mapping of registers to reg is:

  reg[0] = eax
  reg[1] = ebx
  reg[2] = ecx <---
  reg[3] = edx

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal instruction"
errors on AMD CPUs that do not support FMA4.

See pytorch/pytorch#12112
See #261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
  • Loading branch information
colesbury authored and shibatch committed May 16, 2019
1 parent 1be3654 commit 939f753
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/arch/helperavx.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ static int cpuSupportsAVX() {
static int cpuSupportsFMA4() {
int32_t reg[4];
Sleef_x86CpuID(reg, 0x80000001, 0);
return (reg[3] & (1 << 16)) != 0;
return (reg[2] & (1 << 16)) != 0;
}

#if CONFIG == 4 && defined(__AVX__) && defined(__FMA4__)
Expand Down
2 changes: 1 addition & 1 deletion src/libm/dispavx.c.org
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ static int cpuSupportsFMA4() {
if (ret == -1) {
int32_t reg[4];
Sleef_x86CpuID(reg, 0x80000001, 0);
ret = (reg[3] & (1 << 16)) != 0;
ret = (reg[2] & (1 << 16)) != 0;
}
return ret;
}
Expand Down

0 comments on commit 939f753

Please sign in to comment.