Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IBM System/390 support #291

Merged
merged 38 commits into from
Aug 26, 2020
Merged

Add IBM System/390 support #291

merged 38 commits into from
Aug 26, 2020

Conversation

shibatch
Copy link
Owner

This patch adds IBM System/390 support.
Clang is not supported at this time because it seems not supporting VX intrinsics properly.

Note that this is the first big-endian architecture that SLEEF supports.

Configure.cmake Outdated Show resolved Hide resolved
Configure.cmake Outdated Show resolved Hide resolved
Configure.cmake Outdated Show resolved Hide resolved
typedef VECTOR double vdouble;
typedef VECTOR int vint;

typedef VECTOR float vfloat;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z13 doesn't support single-precision, but it can be emulated via two double-precision registers

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this.
It seems that both gcc and qemu support single-precision vector operation with z13 option.
Are both of those buggy?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gcc unrolls each vector operation into scalar operations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector extension in both compilers(clang & gcc) always emulate non-existence instructions which may lead to bad performance, that why I prefer the use of the prototypes of ZVECTOR and VSX instead.
An example of how we should handle single-precision in Z13:

// type
#if CONFIG <= 131
typedef struct {
    vdouble val[2];
} vfloat;
#else
typedef __vector float vfloat;
#endif
// load
static INLINE vfloat vloadu_vf_p(const float *p)
{
#if CONFIG <= 131
    vfloat r;
    r.val[0] = vec_ld2f(p); // load and convert
    r.val[1] = vec_ld2f(p + 2);
    return r;
#else
	return vec_xl(0, p);
#endif
}
// store
static INLINE void vstoreu_v_p_vf(float *p, vfloat v)
{
#if CONFIG <= 131
    vec_st2f(v.val[0], p); // convert and store
    vec_st2f(v.val[1], p + 2);
#else
    return vec_xst(v, 0, p);
#endif
}
// Now emulate all operations via two double-prescion vectors
static INLINE vfloat vsqrt_vf_vf(vfloat vf)
{
#if CONFIG <= 131
    vf.val[0] = vec_sqrt(vf.val[0]);
    vf.val[1] = vec_sqrt(vf.val[1]);
    return vf;
#else
    return vec_sqrt(vf); 
#endif
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is emulation using double-precision operations.
SLEEF has functions that return bit-identical results across all platforms, and those functions cannot be implemented using this method. We need genuine single-precision operations.
I don't know how widely Z13 computers are currently deployed, but is single-precision support for Z13 so important?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z13 mainframe launched in 2015, not sure if still widely used but most of the Linux-one instances provided through IBM/Cloud are Z14 so I guess we can drop the support of Z13 for now,
and focuses only on Z14/Z15

@edelsohn
Copy link
Collaborator

z13 has special instructions to load and store a single precision floating point pair as a double precision floating point pair, and then use normal double precision operations. z14 adds full single precision support.

@edelsohn
Copy link
Collaborator

The Linux Community Cloud systems now are z15.

@shibatch shibatch requested a review from seiko2plus March 26, 2020 06:15
@seiko2plus seiko2plus self-assigned this Mar 26, 2020
@shibatch
Copy link
Owner Author

@seiko2plus @edelsohn

I now wonder if it is worth implementing single-precision functions with emulation with double-precision vector computation with ZVECTOR1.
There are the following two obstacles in implementing the functions in that way.

  • Data types for vfloat will be different between ZVECTOR1 and ZVECTOR2. This will make the code messy.
  • We cannot implement the deterministic version of the functions. We need true single-precision computation for this.

Importance of such functions is not certain. I don't know how widely Z13 computers are being used. It is hard to imagine that users use such functions on only Z13 computers.

The reason that I implemented ZVECTOR1 support is that QEMU 4.2.0 supports up to Z13 processors. So, testing is possible for ZVECTOR1 without real hardware. I also did not notice that the single precision vector operations on Z13 are emulated within the compiler.

There are another option, which is to drop ZVECTOR1 support. So, there are three options.

  1. Continue implementing single precision functions with emulation with double-precision vector computation.
  2. Go with the current implementation.
  3. Drop ZVECTOR1 support.

I would like to know how important ZVECTOR1 support is. Double-precision functions with the current implementation works normally. We can just say that single precision functions with ZVECTOR1 are supplementary. So I think the current implementation is satisfactory. How do you guys think?

@edelsohn
Copy link
Collaborator

The Linux Community Cloud systems all should be z15.
I think that you safely can ignore z13. z13 will not be important for PyTorch users.

@@ -0,0 +1,2 @@
#!/bin/bash
set -ev
Copy link
Collaborator

@seiko2plus seiko2plus Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please dump auxiliary vector via LD_SHOW_AUXV=1 /bin/true to determine the ZARCH version?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@shibatch
Copy link
Owner Author

shibatch commented Apr 7, 2020

@seiko2plus @edelsohn
I have now removed Z13 support.
Please review the patch again.

@shibatch shibatch requested a review from seiko2plus April 7, 2020 00:03
Copy link
Collaborator

@seiko2plus seiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. still, need several improvements similar to #288. But I will work on it later.

@shibatch shibatch merged commit aea57ce into master Aug 26, 2020
@shibatch shibatch deleted the Add_s390x_support_rebased branch August 28, 2020 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants