-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IBM System/390 support #291
Conversation
src/arch/helpers390x_128.h
Outdated
typedef VECTOR double vdouble; | ||
typedef VECTOR int vint; | ||
|
||
typedef VECTOR float vfloat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Z13 doesn't support single-precision, but it can be emulated via two double-precision registers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this.
It seems that both gcc and qemu support single-precision vector operation with z13 option.
Are both of those buggy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gcc unrolls each vector operation into scalar operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector extension in both compilers(clang & gcc) always emulate non-existence instructions which may lead to bad performance, that why I prefer the use of the prototypes of ZVECTOR
and VSX
instead.
An example of how we should handle single-precision in Z13
:
// type
#if CONFIG <= 131
typedef struct {
vdouble val[2];
} vfloat;
#else
typedef __vector float vfloat;
#endif
// load
static INLINE vfloat vloadu_vf_p(const float *p)
{
#if CONFIG <= 131
vfloat r;
r.val[0] = vec_ld2f(p); // load and convert
r.val[1] = vec_ld2f(p + 2);
return r;
#else
return vec_xl(0, p);
#endif
}
// store
static INLINE void vstoreu_v_p_vf(float *p, vfloat v)
{
#if CONFIG <= 131
vec_st2f(v.val[0], p); // convert and store
vec_st2f(v.val[1], p + 2);
#else
return vec_xst(v, 0, p);
#endif
}
// Now emulate all operations via two double-prescion vectors
static INLINE vfloat vsqrt_vf_vf(vfloat vf)
{
#if CONFIG <= 131
vf.val[0] = vec_sqrt(vf.val[0]);
vf.val[1] = vec_sqrt(vf.val[1]);
return vf;
#else
return vec_sqrt(vf);
#endif
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is emulation using double-precision operations.
SLEEF has functions that return bit-identical results across all platforms, and those functions cannot be implemented using this method. We need genuine single-precision operations.
I don't know how widely Z13 computers are currently deployed, but is single-precision support for Z13 so important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Z13 mainframe launched in 2015, not sure if still widely used but most of the Linux-one instances provided through IBM/Cloud are Z14 so I guess we can drop the support of Z13 for now,
and focuses only on Z14/Z15
z13 has special instructions to load and store a single precision floating point pair as a double precision floating point pair, and then use normal double precision operations. z14 adds full single precision support. |
The Linux Community Cloud systems now are z15. |
I now wonder if it is worth implementing single-precision functions with emulation with double-precision vector computation with ZVECTOR1.
Importance of such functions is not certain. I don't know how widely Z13 computers are being used. It is hard to imagine that users use such functions on only Z13 computers. The reason that I implemented ZVECTOR1 support is that QEMU 4.2.0 supports up to Z13 processors. So, testing is possible for ZVECTOR1 without real hardware. I also did not notice that the single precision vector operations on Z13 are emulated within the compiler. There are another option, which is to drop ZVECTOR1 support. So, there are three options.
I would like to know how important ZVECTOR1 support is. Double-precision functions with the current implementation works normally. We can just say that single precision functions with ZVECTOR1 are supplementary. So I think the current implementation is satisfactory. How do you guys think? |
The Linux Community Cloud systems all should be z15. |
travis/before_install.s390x-gcc.sh
Outdated
@@ -0,0 +1,2 @@ | |||
#!/bin/bash | |||
set -ev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please dump auxiliary vector via LD_SHOW_AUXV=1 /bin/true
to determine the ZARCH version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@seiko2plus @edelsohn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. still, need several improvements similar to #288. But I will work on it later.
This patch adds IBM System/390 support.
Clang is not supported at this time because it seems not supporting VX intrinsics properly.
Note that this is the first big-endian architecture that SLEEF supports.