Skip to content

Commit

Permalink
Use kernel CFLAGS for 'kernels' subdirs in addons. (#658)
Browse files Browse the repository at this point in the history
Details:
- Updated Makefile and common.mk so that the targeted configuration's
  kernel CFLAGS are applied to source files that are found in a
  'kernels' subdirectory within an enabled addon. For now, this
  behavior only applies when the 'kernels' directory is at the top
  level of the addon directory structure. For example, if there is an
  addon named 'foobar', the source code must be located in
  addon/foobar/kernels/ in order for it to be compiled with the target
  configurations's kernel CFLAGS. Any other source code within
  addon/foobar/ will be compiled with general-purpose CFLAGS (the same
  ones that were used on all addon code prior to this commit). Thanks
  to AMD (esp. Mithun Mohan) for suggesting this change and catching an
  intermediate bug in the PR.
- Comment/whitespace updates.
- (cherry picked from commit fd885cf)

Fix line number issue in flattened blis.h. (#660)

Details:
- Updated the top-level Makefile so that it invokes flatten-headers.py
  without the -c option, which was requesting that comments be stripped
  (since comment stripping is disabled by default).
- Updated flatten-headers.py to accept a new option (-l) to enable
  insertion of #line directives into the output file. This new option
  is enabled by default.
- Also added logic to flatten-headers.py that outputs a warning if both
  comment stripping and line numbers are requested since the comment
  stripping will cause the line numbers to become inaccurate.
- (cherry picked from commit 6e5431e)

Defined invscalv, invscalm, invscald operations. (#661)

Details:
- Defined invert-scale (invscal) operation on vectors (level-1v),
  matrices (level-1m), and diagonals (level-1d).
- Added test modules for invscalv and invscalm to the testsuite.
- Updated BLISObjectAPI.md and BLISTypedAPI.md API documentation to
  reflect the new operations. Also updated KernelsHowTo.md accordingly.
- Renamed 'beta' to 'alpha' in scalv and scalm testsuite modules (and
  input.operations files) so that the parameter name matches the
  parameter used in the documentation.
- (cherry picked from commit 4afe0cf)

Added '-q' quiet mode option to testsuite. (#657)

Details:
- Added support for a '-q' command line option to the testsuite. This
  option suppresses most informational output that would normally
  clutter up the screen. By default, verbose mode (the previous
  status quo) will be operative, and so quiet mode must be requested.
- (cherry picked from commit a87eae2)

Arm64 dgemmsup with extended MR&NR (#655)

Details:
- Since the number of registers in NEON is large but their lengths are
  short, I'm here extending both MR and NR.
- The approach is to represent the C microtile in registers optionally
  in columns, so for sizes like 6x7m, the 'crr' kernel is the default
  with 'rrr' supported through an in-register transpose.
- A few asm kernels are crafted for 'rv' to complete this extended size
  support.
- For 'rd' I'm still relying heavily on C99 intrinsic kernels with
  branching so the performance might not be optimal. (Sorry for that.)
- So far, these changes only affect the 'firestorm' subconfig.
- This commit also contains row-preferential s12x8 and d6x8 gemm
  ukernels. These microkernels are templatized versions of the existing
  s8x12 and d6x8 ukernels defined in bli_gemm_armv8a_asm_d6x8.c.
- (cherry picked from commit dfa5413)

Temporarily disabled #line directives from 6826c1c.

Details:
- Commented out the inclusion of #line preprocessor directives in the
  flattened header output provided by build/flatten-headers.py. This
  output was added recently in 6826c1c, but was later found to have
  thrown off the line numbering referenced by compiler warnings and
  errors (possibly due to license comment blocks, which are stripped
  from source headers as they are inlined into the monolithic header).
- (cherry picked from commit 9e5594a)
  • Loading branch information
fgvanzee committed Oct 23, 2023
1 parent 43f93d4 commit 57a3df4
Show file tree
Hide file tree
Showing 21 changed files with 3,156 additions and 761 deletions.
51 changes: 45 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -213,8 +213,19 @@ MK_REFKERN_OBJS := $(foreach arch, $(CONFIG_LIST), \
MK_FRAME_OBJS := $(call gen-obj-paths-from-src,$(FRAME_SRC_SUFS),$(MK_FRAME_SRC),$(FRAME_PATH),$(BASE_OBJ_FRAME_PATH))

# Generate object file paths for the addon source code. If one or more addons
# were not enabled a configure-time, this variable will we empty.
MK_ADDON_OBJS := $(call gen-obj-paths-from-src,$(ADDON_SRC_SUFS),$(MK_ADDON_SRC),$(ADDON_PATH),$(BASE_OBJ_ADDON_PATH))
# were not enabled a configure-time, these variable will we empty.
# NOTE: We separate the source and objects into kernel and non-kernel lists.
MK_ADDON_KERS_SRC := $(foreach addon, $(ADDON_LIST), \
$(filter $(ADDON_PATH)/$(addon)/$(KERNELS_DIR)/%, \
$(MK_ADDON_SRC)) \
)
MK_ADDON_OTHER_SRC := $(foreach addon, $(ADDON_LIST), \
$(filter-out $(ADDON_PATH)/$(addon)/$(KERNELS_DIR)/%, \
$(MK_ADDON_SRC)) \
)
MK_ADDON_KERS_OBJS := $(call gen-obj-paths-from-src,$(ADDON_SRC_SUFS),$(MK_ADDON_KERS_SRC),$(ADDON_PATH),$(BASE_OBJ_ADDON_PATH))
MK_ADDON_OTHER_OBJS := $(call gen-obj-paths-from-src,$(ADDON_SRC_SUFS),$(MK_ADDON_OTHER_SRC),$(ADDON_PATH),$(BASE_OBJ_ADDON_PATH))
MK_ADDON_OBJS := $(MK_ADDON_KERS_OBJS) $(MK_ADDON_OTHER_OBJS)

# Generate object file paths for the sandbox source code. If a sandbox was not
# enabled a configure-time, this variable will we empty.
Expand Down Expand Up @@ -492,10 +503,10 @@ flat-header: check-env $(BLIS_H_FLAT)

$(BLIS_H_FLAT): $(ALL_H99_FILES)
ifeq ($(ENABLE_VERBOSE),yes)
$(FLATTEN_H) -c -v1 $(BLIS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
$(FLATTEN_H) -l -v1 $(BLIS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
else
@echo -n "Generating monolithic blis.h"
@$(FLATTEN_H) -c -v1 $(BLIS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
@$(FLATTEN_H) -l -v1 $(BLIS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
@echo "Generated $@"
endif

Expand All @@ -505,10 +516,10 @@ flat-cblas-header: check-env $(CBLAS_H_FLAT)

$(CBLAS_H_FLAT): $(FRAME_H99_FILES)
ifeq ($(ENABLE_VERBOSE),yes)
$(FLATTEN_H) -c -v1 $(CBLAS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
$(FLATTEN_H) -l -v1 $(CBLAS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
else
@echo -n "Generating monolithic cblas.h"
@$(FLATTEN_H) -c -v1 $(CBLAS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
@$(FLATTEN_H) -l -v1 $(CBLAS_H_SRC_PATH) $@ "./$(INCLUDE_DIR)" "$(ALL_H99_DIRPATHS)"
@echo "Generated $@"
endif

Expand Down Expand Up @@ -580,6 +591,7 @@ endef

# first argument: a configuration name from the union of config_list and
# config_name, used to look up the CFLAGS to use during compilation.
# second argument: the C99 addon file suffix being considered.
define make-c99-addon-rule
$(BASE_OBJ_ADDON_PATH)/%.o: $(ADDON_PATH)/%.$(2) $(BLIS_H_FLAT) $(ADDON_H99_FILES) $(MAKE_DEFS_MK_PATHS)
ifeq ($(ENABLE_VERBOSE),yes)
Expand All @@ -590,6 +602,23 @@ else
endif
endef

# first argument: a configuration name from the union of config_list and
# config_name, used to look up the CFLAGS to use during compilation.
# second argument: the C99 addon file suffix being considered.
# third argument: the name of the addon being considered.
define make-c99-addon-kers-rule
$(BASE_OBJ_ADDON_PATH)/$(3)/$(KERNELS_DIR)/%.o: $(ADDON_PATH)/$(3)/$(KERNELS_DIR)/%.$(2) $(BLIS_H_FLAT) $(ADDON_H99_FILES) $(MAKE_DEFS_MK_PATHS)
ifeq ($(ENABLE_VERBOSE),yes)
$(CC) $(call get-addon-kernel-c99flags-for,$(1)) -c $$< -o $$@
else
@echo "Compiling $$@" $(call get-addon-kernel-text-for,$(1))
@$(CC) $(call get-addon-kernel-c99flags-for,$(1)) -c $$< -o $$@
endif
endef

# first argument: a configuration name from the union of config_list and
# config_name, used to look up the CFLAGS to use during compilation.
# second argument: the C++ addon file suffix being considered.
define make-cxx-addon-rule
$(BASE_OBJ_ADDON_PATH)/%.o: $(ADDON_PATH)/%.$(2) $(BLIS_H_FLAT) $(ADDON_HXX_FILES) $(MAKE_DEFS_MK_PATHS)
ifeq ($(ENABLE_VERBOSE),yes)
Expand All @@ -602,6 +631,7 @@ endef

# first argument: a configuration name from the union of config_list and
# config_name, used to look up the CFLAGS to use during compilation.
# second argument: the C99 sandbox file suffix being considered.
define make-c99-sandbox-rule
$(BASE_OBJ_SANDBOX_PATH)/%.o: $(SANDBOX_PATH)/%.$(2) $(BLIS_H_FLAT) $(SANDBOX_H99_FILES) $(MAKE_DEFS_MK_PATHS)
ifeq ($(ENABLE_VERBOSE),yes)
Expand All @@ -612,6 +642,9 @@ else
endif
endef

# first argument: a configuration name from the union of config_list and
# config_name, used to look up the CFLAGS to use during compilation.
# second argument: the C++ sandbox file suffix being considered.
define make-cxx-sandbox-rule
$(BASE_OBJ_SANDBOX_PATH)/%.o: $(SANDBOX_PATH)/%.$(2) $(BLIS_H_FLAT) $(SANDBOX_HXX_FILES) $(MAKE_DEFS_MK_PATHS)
ifeq ($(ENABLE_VERBOSE),yes)
Expand Down Expand Up @@ -657,6 +690,12 @@ $(foreach kset, $(KERNEL_LIST), $(eval $(call make-kernels-rule,$(kset),$(call g
$(foreach suf, $(ADDON_C99_SUFS), \
$(foreach conf, $(CONFIG_NAME), $(eval $(call make-c99-addon-rule,$(conf),$(suf)))))

# Instantiate the build rule for C addon/kernels files. Use the CFLAGS for the
# configuration family.
$(foreach addon, $(ADDON_LIST), \
$(foreach suf, $(ADDON_C99_SUFS), \
$(foreach conf, $(CONFIG_NAME), $(eval $(call make-c99-addon-kers-rule,$(conf),$(suf),$(addon))))))

# Instantiate the build rule for C++ addon files. Use the CFLAGS for the
# configuration family.
$(foreach suf, $(ADDON_CXX_SUFS), \
Expand Down
16 changes: 13 additions & 3 deletions build/flatten-headers.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,14 +278,16 @@ def flatten_header( inputfile, header_dirpaths, cursp ):

# Mark the beginning of the header being inserted.
ostring += "%s%s%c" % ( beginstr, header, '\n' )
ostring += "#line %d \"%s\"%c\n" % ( 1, header_path, '\n' )
if line_numbers:
ostring += "#line %d \"%s\"%c\n" % ( 1, header_path, '\n' )

# Recurse on the header, accumulating the string.
ostring += flatten_header( header_path, header_dirpaths, cursp + " " )

# Mark the end of the header being inserted.
ostring += "%s%s%c" % ( endstr, header, '\n' )
ostring += "#line %d \"%s\"%c\n" % ( lineno+1, inputfile, '\n' )
if line_numbers:
ostring += "#line %d \"%s\"%c\n" % ( lineno+1, inputfile, '\n' )

echov2( "%sheader file '%s' fully processed." \
% ( cursp, header_path ) )
Expand Down Expand Up @@ -350,6 +352,7 @@ def find_header_dirs( dirpath ):
output_name = None
strip_comments = None
recursive_flag = None
line_numbers = None
verbose_flag = None
regex = None
root_inputfile = None
Expand All @@ -360,6 +363,7 @@ def main():
global output_name
global strip_comments
global recursive_flag
global line_numbers
global verbose_flag
global regex
global root_inputfile
Expand All @@ -371,13 +375,14 @@ def main():

strip_comments = False
recursive_flag = False
line_numbers = False
verbose_flag = "1"

nestsp = " "

# Process our command line options.
try:
opts, args = getopt.getopt( sys.argv[1:], "o:rchv:" )
opts, args = getopt.getopt( sys.argv[1:], "o:rclhv:" )

except getopt.GetoptError as err:
# print help information and exit:
Expand All @@ -390,6 +395,8 @@ def main():
output_name = optarg
elif opt == "-r":
recursive_flag = True
elif opt == "-l":
line_numbers = True
elif opt == "-c":
strip_comments = True
elif opt == "-v":
Expand All @@ -401,6 +408,9 @@ def main():
print_usage()
sys.exit()

if line_numbers and strip_comments:
my_print( "WARNING: stripping comments will result in inaccurate line numbers" )

# Make sure that the verboseness level is valid.
if ( verbose_flag != "0" and
verbose_flag != "1" and
Expand Down
32 changes: 21 additions & 11 deletions common.mk
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ get-kernel-cflags-for = $(strip $(call load-var-for,CKOPTFLAGS,$(1)) \
$(BUILD_SYMFLAGS) \
)

# When compiling sandboxes, we use flags similar to those of general framework
# When compiling addons, we use flags similar to those of general framework
# source. This ensures that the same code can be linked and run across various
# sub-configurations.
get-addon-c99flags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \
Expand All @@ -169,6 +169,15 @@ get-addon-cxxflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \
$(BUILD_CPPFLAGS) \
$(BUILD_SYMFLAGS) \
)
# When compiling addon kernels, we use flags similar to those of kernels
# flags, except we also include the addon header paths.
get-addon-kernel-c99flags-for = $(strip $(call load-var-for,CKOPTFLAGS,$(1)) \
$(call load-var-for,CKVECFLAGS,$(1)) \
$(call get-noopt-cflags-for,$(1)) \
$(CADDONINCFLAGS) \
$(BUILD_CPPFLAGS) \
$(BUILD_SYMFLAGS) \
)

# When compiling sandboxes, we use flags similar to those of general framework
# source. This ensures that the same code can be linked and run across various
Expand Down Expand Up @@ -203,16 +212,17 @@ get-user-cflags-for = $(strip $(call load-var-for,COPTFLAGS,$(1)) \

# Define functions that return messages appropriate for each non-verbose line
# of compilation output.
get-noopt-text = "(CFLAGS for no optimization)"
get-refinit-text-for = "('$(1)' CFLAGS for ref. kernel init)"
get-refkern-text-for = "('$(1)' CFLAGS for ref. kernels)"
get-config-text-for = "('$(1)' CFLAGS for config code)"
get-frame-text-for = "('$(1)' CFLAGS for framework code)"
get-kernel-text-for = "('$(1)' CFLAGS for kernels)"
get-addon-c99text-for = "('$(1)' CFLAGS for addons)"
get-addon-cxxtext-for = "('$(1)' CXXFLAGS for addons)"
get-sandbox-c99text-for = "('$(1)' CFLAGS for sandboxes)"
get-sandbox-cxxtext-for = "('$(1)' CXXFLAGS for sandboxes)"
get-noopt-text = "(CFLAGS for no optimization)"
get-refinit-text-for = "('$(1)' CFLAGS for ref. kernel init)"
get-refkern-text-for = "('$(1)' CFLAGS for ref. kernels)"
get-config-text-for = "('$(1)' CFLAGS for config code)"
get-frame-text-for = "('$(1)' CFLAGS for framework code)"
get-kernel-text-for = "('$(1)' CFLAGS for kernels)"
get-addon-c99text-for = "('$(1)' CFLAGS for addons)"
get-addon-cxxtext-for = "('$(1)' CXXFLAGS for addons)"
get-addon-kernel-text-for = "('$(1)' CFLAGS for addon kernels)"
get-sandbox-c99text-for = "('$(1)' CFLAGS for sandboxes)"
get-sandbox-cxxtext-for = "('$(1)' CXXFLAGS for sandboxes)"



Expand Down
32 changes: 17 additions & 15 deletions config/firestorm/bli_cntx_init_firestorm.c
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,14 @@ void bli_cntx_init_firestorm( cntx_t* cntx )
cntx,

// level-3
BLIS_GEMM_UKR, BLIS_FLOAT, bli_sgemm_armv8a_asm_8x12,
BLIS_GEMM_UKR, BLIS_DOUBLE, bli_dgemm_armv8a_asm_6x8,
BLIS_GEMM_UKR, BLIS_FLOAT, bli_sgemm_armv8a_asm_12x8r,
BLIS_GEMM_UKR, BLIS_DOUBLE, bli_dgemm_armv8a_asm_8x6r,

// packm
BLIS_PACKM_MRXK_KER, BLIS_FLOAT, bli_spackm_armv8a_int_8xk,
BLIS_PACKM_NRXK_KER, BLIS_FLOAT, bli_spackm_armv8a_int_12xk,
BLIS_PACKM_MRXK_KER, BLIS_DOUBLE, bli_dpackm_armv8a_int_6xk,
BLIS_PACKM_NRXK_KER, BLIS_DOUBLE, bli_dpackm_armv8a_int_8xk,
BLIS_PACKM_MRXK_KER, BLIS_FLOAT, bli_spackm_armv8a_int_12xk,
BLIS_PACKM_NRXK_KER, BLIS_FLOAT, bli_spackm_armv8a_int_8xk,
BLIS_PACKM_MRXK_KER, BLIS_DOUBLE, bli_dpackm_armv8a_int_8xk,
BLIS_PACKM_NRXK_KER, BLIS_DOUBLE, bli_dpackm_armv8a_int_6xk,

// gemmsup
BLIS_GEMMSUP_RRR_UKR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m,
Expand All @@ -77,8 +77,8 @@ void bli_cntx_init_firestorm( cntx_t* cntx )
cntx,

// level-3
BLIS_GEMM_UKR_ROW_PREF, BLIS_FLOAT, FALSE,
BLIS_GEMM_UKR_ROW_PREF, BLIS_DOUBLE, FALSE,
BLIS_GEMM_UKR_ROW_PREF, BLIS_FLOAT, TRUE,
BLIS_GEMM_UKR_ROW_PREF, BLIS_DOUBLE, TRUE,

// gemmsup
BLIS_GEMMSUP_RRR_UKR_ROW_PREF, BLIS_DOUBLE, TRUE,
Expand All @@ -95,11 +95,11 @@ void bli_cntx_init_firestorm( cntx_t* cntx )

// Initialize level-3 blocksize objects with architecture-specific values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 8, 6, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 12, 8, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 120, 252, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 640, 3072, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 3072, 8192, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 12, 8, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 8, 6, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 480, 256, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 4096, 3072, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 9600, 8184, -1, -1 );

// Initialize sup thresholds with architecture-appropriate values.
// s d c z
Expand All @@ -110,8 +110,10 @@ void bli_cntx_init_firestorm( cntx_t* cntx )
// Initialize level-3 sup blocksize objects with architecture-specific
// values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR_SUP ], -1, 6, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR_SUP ], -1, 8, -1, -1 );
bli_blksz_init ( &blkszs[ BLIS_MR_SUP ], -1, 6, -1, -1,
-1, 9, -1, -1 );
bli_blksz_init ( &blkszs[ BLIS_NR_SUP ], -1, 8, -1, -1,
-1, 13, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC_SUP ], -1, 240, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_KC_SUP ], -1, 1024, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC_SUP ], -1, 3072, -1, -1 );
Expand Down
40 changes: 40 additions & 0 deletions kernels/armv8a/3/armv8a_asm_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,18 @@
CLEAR4V(V4,V5,V6,V7)

// Scale vectors.
#define SSCALE1V(V,A,IDX) \
" fmul v"#V".4s, v"#V".4s, v"#A".s["#IDX"] \n\t"
#define SSCALE2V(V0,V1,A,IDX) \
SSCALE1V(V0,A,IDX) \
SSCALE1V(V1,A,IDX)
#define SSCALE4V(V0,V1,V2,V3,A,IDX) \
SSCALE2V(V0,V1,A,IDX) \
SSCALE2V(V2,V3,A,IDX)
#define SSCALE8V(V0,V1,V2,V3,V4,V5,V6,V7,A,IDX) \
SSCALE4V(V0,V1,V2,V3,A,IDX) \
SSCALE4V(V4,V5,V6,V7,A,IDX)

#define DSCALE1V(V,A,IDX) \
" fmul v"#V".2d, v"#V".2d, v"#A".d["#IDX"] \n\t"
#define DSCALE2V(V0,V1,A,IDX) \
Expand All @@ -74,6 +86,18 @@
DSCALE4V(V4,V5,V6,V7,A,IDX)

// Scale-accumulate.
#define SSCALEA1V(D,S,A,IDX) \
" fmla v"#D".4s, v"#S".4s, v"#A".s["#IDX"] \n\t"
#define SSCALEA2V(D0,D1,S0,S1,A,IDX) \
SSCALEA1V(D0,S0,A,IDX) \
SSCALEA1V(D1,S1,A,IDX)
#define SSCALEA4V(D0,D1,D2,D3,S0,S1,S2,S3,A,IDX) \
SSCALEA2V(D0,D1,S0,S1,A,IDX) \
SSCALEA2V(D2,D3,S2,S3,A,IDX)
#define SSCALEA8V(D0,D1,D2,D3,D4,D5,D6,D7,S0,S1,S2,S3,S4,S5,S6,S7,A,IDX) \
SSCALEA4V(D0,D1,D2,D3,S0,S1,S2,S3,A,IDX) \
SSCALEA4V(D4,D5,D6,D7,S4,S5,S6,S7,A,IDX)

#define DSCALEA1V(D,S,A,IDX) \
" fmla v"#D".2d, v"#S".2d, v"#A".d["#IDX"] \n\t"
#define DSCALEA2V(D0,D1,S0,S1,A,IDX) \
Expand All @@ -95,8 +119,16 @@
#define DLOAD4V(V0,V1,V2,V3,ADDR,SHIFT) \
DLOAD2V(V0,V1,ADDR,SHIFT) \
DLOAD2V(V2,V3,ADDR,SHIFT+32)
#define SLOAD1V DLOAD1V
#define SLOAD2V DLOAD2V
#define SLOAD4V DLOAD4V

// Generic: load one line.
#define SLOAD1V_GATHER_ELMFWD(V,ADDR,INC) \
" ld1 {v"#V".s}[0], ["#ADDR"], "#INC" \n\t" \
" ld1 {v"#V".s}[1], ["#ADDR"], "#INC" \n\t" \
" ld1 {v"#V".s}[2], ["#ADDR"], "#INC" \n\t" \
" ld1 {v"#V".s}[3], ["#ADDR"], "#INC" \n\t"
#define DLOAD1V_GATHER_ELMFWD(V,ADDR,INC) \
" ld1 {v"#V".d}[0], ["#ADDR"], "#INC" \n\t" \
" ld1 {v"#V".d}[1], ["#ADDR"], "#INC" \n\t"
Expand All @@ -110,8 +142,16 @@
#define DSTORE4V(V0,V1,V2,V3,ADDR,SHIFT) \
DSTORE2V(V0,V1,ADDR,SHIFT) \
DSTORE2V(V2,V3,ADDR,SHIFT+32)
#define SSTORE1V DSTORE1V
#define SSTORE2V DSTORE2V
#define SSTORE4V DSTORE4V

// Generic: store one line.
#define SSTORE1V_SCATTER_ELMFWD(V,ADDR,INC) \
" st1 {v"#V".s}[0], ["#ADDR"], "#INC" \n\t" \
" st1 {v"#V".s}[1], ["#ADDR"], "#INC" \n\t" \
" st1 {v"#V".s}[2], ["#ADDR"], "#INC" \n\t" \
" st1 {v"#V".s}[3], ["#ADDR"], "#INC" \n\t"
#define DSTORE1V_SCATTER_ELMFWD(V,ADDR,INC) \
" st1 {v"#V".d}[0], ["#ADDR"], "#INC" \n\t" \
" st1 {v"#V".d}[1], ["#ADDR"], "#INC" \n\t"
Expand Down
Loading

0 comments on commit 57a3df4

Please sign in to comment.