Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAPI instrumentation #57

Merged
merged 15 commits into from
Apr 15, 2024
Merged

PAPI instrumentation #57

merged 15 commits into from
Apr 15, 2024

Conversation

calebyhan
Copy link
Contributor

@calebyhan calebyhan commented Jul 24, 2023

PAPI instrumentation for Levels 1, 2, 3, device, device-batch BLAS++ routines.

calebhan ~/blaspp/> test/tester --repeat 3 --papi y --dim 1000:3000 symm
BLAS++ version 2023.06.00, id a95427d
input: test/tester --repeat 3 --papi y --dim '1000:3000' symm
                                                                                                                                           
type  layout    side    uplo       m       n      alpha       beta     error   time (s)       gflop/s  ref time (s)   ref gflop/s  status  
   d     col    left   lower    1000    1000     3.1416     2.7183  0.00e+00     0.0122       163.583       0.00603       331.521  pass    
   d     col    left   lower    1000    1000     3.1416     2.7183  0.00e+00    0.00582       343.853       0.00590       339.091  pass    
   d     col    left   lower    1000    1000     3.1416     2.7183  0.00e+00    0.00582       343.565       0.00586       341.339  pass    

   d     col    left   lower    2000    2000     3.1416     2.7183  0.00e+00     0.0323       494.795        0.0275       581.903  pass    
   d     col    left   lower    2000    2000     3.1416     2.7183  0.00e+00     0.0284       563.222        0.0290       550.914  pass    
   d     col    left   lower    2000    2000     3.1416     2.7183  0.00e+00     0.0286       558.630        0.0282       566.540  pass    

   d     col    left   lower    3000    3000     3.1416     2.7183  0.00e+00     0.0923       585.314        0.0889       607.732  pass    
   d     col    left   lower    3000    3000     3.1416     2.7183  0.00e+00     0.0924       584.397        0.0940       574.361  pass    
   d     col    left   lower    3000    3000     3.1416     2.7183  0.00e+00     0.0911       592.832        0.0927       582.612  pass    

symm( L, L, 1000, 1000 ) count 1, flop count 2.00e+09
symm( L, L, 3000, 3000 ) count 3, flop count 5.40e+10
symm( L, L, 1000, 1000 ) count 2, flop count 2.00e+09
symm( L, L, 2000, 2000 ) count 3, flop count 1.60e+10
total BLAS flop count 7.40e+10

All tests passed for symm.

@mgates3
Copy link
Collaborator

mgates3 commented Mar 9, 2024

We finally resolved the issue with duplicated entries. The padding bytes in the struct were not initialized, so the key was different, though the information didn't appear different when printed. Using memset on the elements solves the issue. Output with the fix:

sh leconte test> ./tester --repeat 3 --papi y --dim 100:300 symm
BLAS++ version 2023.06.00, id a60d22c
input: ./tester --repeat 3 --papi y --dim '100:300' symm
                                                                                                                                           
type  layout    side    uplo       m       n      alpha       beta     error   time (s)       gflop/s  ref time (s)   ref gflop/s  status  
   d     col    left   lower     100     100     3.1416     2.7183  0.00e+00     0.0166         0.120      7.40e-05        27.031  pass    
   d     col    left   lower     100     100     3.1416     2.7183  0.00e+00   5.12e-05        39.075      3.77e-05        53.014  pass    
   d     col    left   lower     100     100     3.1416     2.7183  0.00e+00   4.26e-05        46.906      3.42e-05        58.475  pass    

   d     col    left   lower     200     200     3.1416     2.7183  0.00e+00    0.00182         8.794      8.52e-05       187.684  pass    
   d     col    left   lower     200     200     3.1416     2.7183  0.00e+00   0.000113       141.459      9.75e-05       164.066  pass    
   d     col    left   lower     200     200     3.1416     2.7183  0.00e+00   0.000111       144.392      8.22e-05       194.761  pass    

   d     col    left   lower     300     300     3.1416     2.7183  0.00e+00    0.00171        31.654      0.000214       252.268  pass    
   d     col    left   lower     300     300     3.1416     2.7183  0.00e+00   0.000235       229.326      0.000205       263.155  pass    
   d     col    left   lower     300     300     3.1416     2.7183  0.00e+00   0.000230       234.888      0.000199       271.576  pass    

symm( L, L, 300, 300 ) count 3, flop count 1.62e+08
symm( L, L, 200, 200 ) count 3, flop count 4.80e+07
symm( L, L, 100, 100 ) count 3, flop count 6.00e+06
total BLAS flop count 2.16e+08

All tests passed for symm.

@mgates3
Copy link
Collaborator

mgates3 commented Mar 9, 2024

TODO:

  • Measure overhead. Put element into #ifdef if non-negligible.
  • configure.py and CMake to detect PAPI, if requested.
    • This isn't a huge deal since PAPI will be disabled by default.

@mgates3
Copy link
Collaborator

mgates3 commented Apr 1, 2024

Currently need to manually add in the build process, e.g.,

blaspp> make config  # creates make.inc

# edit make.inc to add:
#----------
CXXFLAGS += -I${PAPI_DIR}/include -DBLAS_HAVE_PAPI
LDFLAGS  += -L${PAPI_DIR}/lib -Wl,-rpath,${PAPI_DIR}/lib -lsde -lpapi

blaspp> make

src/rotg.cc Outdated Show resolved Hide resolved
@mgates3 mgates3 merged commit 6515819 into icl-utk-edu:master Apr 15, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants