Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[[email protected]] Add patch to fix CGEMM/CTRMM/DNRM2 on Apple M1/M2 #8022

Merged
merged 2 commits into from
Jan 26, 2024

Conversation

pablosanjose
Copy link
Contributor

@pablosanjose pablosanjose commented Jan 26, 2024

Ref JuliaLang/LinearAlgebra.jl#1051

This should bring the fix from OpenMathLib/OpenBLAS#4003 to Julia 1.10

@giordano giordano changed the title Add patch from #4003 to OpenBLAS v0.3.23 [[email protected]] Add patch to fix CGEMM/CTRMM/DNRM2 on Apple M1/M2 Jan 26, 2024
@ViralBShah
Copy link
Member

@pablosanjose
Copy link
Contributor Author

yes, let me try to reconstruct the patch

@giordano
Copy link
Member

giordano commented Jan 26, 2024

The diff may be simpler than than the patch (but it loses the history information), maybe it's easier to start from there.

@pablosanjose
Copy link
Contributor Author

pablosanjose commented Jan 26, 2024

I'm just learning the ropes here, but I think the issue might be that we also need the M2 patch from commit caa29451, since otherwise the patch does not apply properly to the 0.3.23 release

@giordano
Copy link
Member

Yes, but at this point it's easier to have a single cumulative patch. Something like (untested!) should work

diff --git a/cpuid_arm64.c b/cpuid_arm64.c
index 809f48e95a..e586f9a3c2 100644
--- a/cpuid_arm64.c
+++ b/cpuid_arm64.c
@@ -267,8 +267,9 @@ int detect(void)
 	}
 #else
 #ifdef __APPLE__
-	sysctlbyname("hw.cpufamily",&value,&length,NULL,0);
-	if (value ==131287967|| value == 458787763 ) return CPU_VORTEX; //A12/M1
-	if (value == 3660830781) return CPU_VORTEX; //A15/M2
+	sysctlbyname("hw.cpufamily",&value64,&length64,NULL,0);
+	if (value ==131287967|| value == 458787763 ) return CPU_VORTEX;
 #endif
 	return CPU_ARMV8;	
 #endif
diff --git a/kernel/arm64/cgemm_kernel_8x4.S b/kernel/arm64/cgemm_kernel_8x4.S
index 24e08a646a..f100adc7af 100644
--- a/kernel/arm64/cgemm_kernel_8x4.S
+++ b/kernel/arm64/cgemm_kernel_8x4.S
@@ -49,7 +49,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #define pCRow3		x15
 #define pA		x16
 #define alphaR		w17
-#define alphaI		w18
+#define alphaI		w19
 
 #define alpha0_R	s10
 #define alphaV0_R	v10.s[0]
diff --git a/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S b/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S
index 29a68ff227..2c63925be2 100644
--- a/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S
+++ b/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S
@@ -49,7 +49,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #define pCRow3		x15
 #define pA		x16
 #define alphaR		w17
-#define alphaI		w18
+#define alphaI		w19
 
 #define alpha0_R	s10
 #define alphaV0_R	v10.s[0]
diff --git a/kernel/arm64/ctrmm_kernel_8x4.S b/kernel/arm64/ctrmm_kernel_8x4.S
index 5c08273975..e8f1d8cf30 100644
--- a/kernel/arm64/ctrmm_kernel_8x4.S
+++ b/kernel/arm64/ctrmm_kernel_8x4.S
@@ -49,10 +49,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #define pCRow3		x15
 #define pA		x16
 #define alphaR		w17
-#define alphaI		w18
-#define temp		x19
-#define tempOffset	x20
-#define tempK		x21
+#define alphaI		w19
+#define temp		x20
+#define tempOffset	x21
+#define tempK		x22
 
 #define alpha0_R	s10
 #define alphaV0_R	v10.s[0]
diff --git a/kernel/arm64/dznrm2_thunderx2t99.c b/kernel/arm64/dznrm2_thunderx2t99.c
index e342b0b63f..6077c85dd1 100644
--- a/kernel/arm64/dznrm2_thunderx2t99.c
+++ b/kernel/arm64/dznrm2_thunderx2t99.c
@@ -27,7 +27,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
 #include "common.h"
-
+#include <float.h>
 #include <arm_neon.h>
 
 #if defined(SMP)
@@ -404,7 +404,8 @@ FLOAT CNAME(BLASLONG n, FLOAT *x, BLASLONG inc_x)
 #else
 	nrm2_compute(n, x, inc_x, &ssq, &scale);
 #endif
-	if (fabs(scale) <1.e-300) return 0.;
+	volatile FLOAT sca = fabs(scale);
+	if (sca < DBL_MIN) return 0.;
 	ssq = sqrt(ssq) * scale;
 
 	return ssq;

@ViralBShah
Copy link
Member

Seems reasonable to pull that in as well. It is not surprising that we need more than one patch.

@pablosanjose
Copy link
Contributor Author

I've added the M2 patch together with the actual fix, including the history information, and it seems to be passing. Thanks to both for your guidance.
So if this eventually goes green and is merged, what else is needed to get the fix bundled into 1.10? Will it be automatic upon releasing 1.10.1?

@giordano
Copy link
Member

So if this eventually goes green and is merged, what else is needed to get the fix bundled into 1.10? Will it be automatic upon releasing 1.10.1?

Not quite. You'll need to make a PR for Julia against branch https://github.com/JuliaLang/julia/tree/backports-release-1.10 to update the build of OpenBLAS, see for example instructions at JuliaLang/julia#52762 (comment). In your case you can skip second point because you aren't changing the upstream version.

@ViralBShah
Copy link
Member

ViralBShah commented Jan 26, 2024

Should we be using value64 and length64 as in the OpenBLAS 4003 PR and the 0.3.24 release? That seems to be the most recent commit.

sysctlbyname("hw.cpufamily",&value64,&length64,NULL,0);
	if (value64 ==131287967|| value64 == 458787763 ) return CPU_VORTEX; //A12/M1
	if (value64 == 3660830781) return CPU_VORTEX; //A15/M2

@pablosanjose
Copy link
Contributor Author

We are, see PATCH 6/8 (line 187)

@ViralBShah
Copy link
Member

Ah thanks. I didn't realize that the patches are stacked on top of each other.

@giordano giordano enabled auto-merge (squash) January 26, 2024 17:58
@giordano giordano merged commit 4142373 into JuliaPackaging:master Jan 26, 2024
25 checks passed
@pablosanjose pablosanjose deleted the m1-4003-patch branch January 26, 2024 18:36
@pablosanjose
Copy link
Contributor Author

Not quite. You'll need to make a PR for Julia against branch https://github.com/JuliaLang/julia/tree/backports-release-1.10 to update the build of OpenBLAS, see for example instructions at JuliaLang/julia#52762 (comment). In your case you can skip second point because you aren't changing the upstream version.

JuliaLang/julia#53074

Alright, I hope I did that correctly. (Perhaps the PR title should be changed).

grasph pushed a commit to Moelf/Yggdrasil that referenced this pull request Jul 1, 2024
…uliaPackaging#8022)

* add patch openblas#4003 to v0.3.23

* include e0f8b4fe in patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants