-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate inline headers #283
Conversation
Thanks, I will test this out today |
Thanks, this addresses the request in #230 In my limited testing, the functions are now successfully inlined at call sites. In our sigmoid function, I measured a ~14% improvement for 1 million floats by inlining the call to Sleef_expf8_u10. I haven't done measurements of other functions. Currently, we include Sleef as a submodule and build it as part of the PyTorch build process. The changes to the build make this more difficult. I think we may switch to building Sleef separately and just committing the artifacts (e.g. sleefinline_avx2.h) to the PyTorch repo. I think that should be fine. |
Good. |
Jenkinsfile
Outdated
@@ -15,10 +15,10 @@ pipeline { | |||
mkdir build | |||
cd build | |||
cmake -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE .. | |||
make -j 6 all | |||
make -j 1 all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are changes like this needed in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because add_custom_command does not support parallel builds. In this patch, many COMMANDS are executed within one add_custom_command, and parallel builds fail even on linux computers.
|
||
#define ENABLE_DP | ||
//@#define ENABLE_DP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this syntax for? I have seen it in other places, so I suspect there is a reason for using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please read my first comment. Generation of the header files requires multiple processing with cpp, and some of the macros are required in later use.
Addition to helper files : This patch adds lines beginning with //@ which is specially treated during generation of the header files.
src/arch/helperavx.h
Outdated
@@ -649,6 +664,7 @@ static INLINE vargquad vcast_aq_vm2(vmask2 vm2) { | |||
return a; | |||
#endif | |||
} | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif | |
#endif // #if !defined(SLEEF_GENHEADER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// | ||
|
||
#ifdef ENABLE_DP | ||
int check_featureDP() { | ||
if (vavailability_i(1) == 0) return 0; | ||
int check_featureDP(double d) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for the header functionality? Same question for the float version below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the headers, functions like this have greater tendency of being optimized away.
This change is needed to prevent optimizer from removing this function.
src/libm/CMakeLists.txt
Outdated
OUTPUT ${INLINE_HEADER_FILE} | ||
|
||
COMMAND echo Generating sleefinline_${SIMDLC}.h | ||
COMMAND "${CMAKE_C_COMPILER}" ${FLAG_PREPROCESS} ${FLAG_PRESERVE_COMMENTS} ${FLAG_INCLUDE}${PROJECT_SOURCE_DIR}/src/common ${FLAG_INCLUDE}${PROJECT_SOURCE_DIR}/src/arch ${FLAG_INCLUDE}${CMAKE_CURRENT_BINARY_DIR}/include/ ${FLAG_DEFINE}SLEEF_GENHEADER ${FLAG_DEFINE}ENABLE_${SIMD} ${FLAG_DEFINE}DORENAME ${CMAKE_CURRENT_SOURCE_DIR}/sleefsimddp.c > ${CMAKE_CURRENT_BINARY_DIR}/sleef${SIMD}.h.tmp1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break long lines. It is hard to understand what is going on here. Please try not to pass the 80 chars limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments in this section.
#if !defined(SLEEF_GENHEADER) | ||
#define FUNCATR NOEXPORT ALIGNED(64) | ||
#else | ||
#define FUNCATR EXPORT ALIGNED(64) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table is referenced from the source code including the generated header.
So, this table has to be compiled into a library.
#if !defined(SLEEF_GENHEADER) | ||
#include "helpersse2.h" | ||
#else | ||
#include "macroonlySSE2.h" | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to header files? Couldn't we conditionally compile the same header file, exposing only the macros when SLEEF_GENHEADER is defined? This could avoid the need of generating the macroonly*
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of only exposing some macros, I chose the approach to define some of the macros beginning with //@.
Below is comparison between pros and cons.
Approach 1: Exposing a part of the macros according to whether SLEEF_GENHEADER is defined.
Pros: The mechanism is straightforward.
Cons: We need to protect most of the part in the macro with large #ifdef. This is hard to read.
Generation of macros is not very understandable anyway.
Approach 2: Define some of the macros beginning with //@
Pros: Easier to read and manage the source code.
Cons: This is a dedicated way for SLEEF. and requires understanding how the macros are generated.
travis/before_script.osx-clang.sh
Outdated
@@ -2,4 +2,4 @@ | |||
set -ev | |||
mkdir sleef.build | |||
cd sleef.build | |||
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE .. | |||
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DSLEEF_SHOW_CONFIG=1 -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DENFORCE_TESTER3=TRUE -DBUILD_QUAD=TRUE -DBUILD_INLINE_HEADERS=TRUE .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that you are always generating the header files because you want to test them on each architecture with the iuti*
executables? Has this increased the running time in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, running time in CI is increased.
If that is a problem, we can just turn off generation.
How comes that CI hasn't completed yet? |
Because of a policy change in network security at my institute, I now have to move the CI servers to a different network segment. |
This patch implements the functionality described in issue #282.
The header files for inlining sleef functions are generated if -DBUILD_INLINE_HEADERS=TRUE is specified as a cmake option.
In order to use one of these headers, the following 3 macros have to be defined.
Each header file corresponds to one vector extension. Only one of the header files can be included from the same source file.
It also builds libsleefinline.a, that is referred from the inlined functions.
Aarch32 is not supported at this time.
--
The main change to the source codes can be summarized into the following two points.
It seems that parallel build does not work even on linux computers. According to the following page, parallel build is always unsafe if multiple COMMANDs are used in add_custom_command. Accordingly, I changed the CI settings to remove parallel builds.
https://stackoverflow.com/questions/60105230/how-to-run-cmake-commands-in-add-custom-command-in-order
Currently, Jenkins servers are not available. I hope they will be available again before this patch is approved. I manually ran the builds and tests on every platform.
@colesbury Github does not allow me to add you as a reviewer, but please tell us your thoughts.