Fortran Stencil Microbenchmarks
assesses the performance of different memory-bound finite difference kernels within Fortran. Compares the results using different hardware and software variables for use as machine-specific best practices.
For a more detailed explanation of methodology, see 0 - Introduction to Fortran Microbenchmarks
Initial work is a result of a HPC Fortran microbenchmarks internship at LJK, Grenoble
See documentation markdown file for further details.
For further developments to come, see TODOlist For a full summary made for a presentation, see summary
bench/
subdirectory :
- contains benchmark files
preprocess/
contains a codegen Python script and a Bash script to generate the benchmark variation tree and its output data
doc/
subdirectory :
- contains documentation
bench/src/perf_regions/
subdirectory :
- contains necessary files from eponymous library (see requirements)
tuto/
subdirectory :
- contains hello world and tutorial files used to learn Fortran
Timing libraries
- PAPI
perf_regions
code annotation library https://github.com/schreiberx/perf_regions, installed in main folder
Makefile
compiles all subdirectories, and has options:
preprocessing
orpre
to run preprocessing scripts that execute all relevant benchmark variations in the generated benchmark variation treerun
to domake run
in all subdirectories, executing the main files and scriptsrun_bench
andrun_tuto
tomake run
specifically the bench folder or the tuto folder
clean
to clean all executable files and temporary files in the subdirectories from the current OS- set
PERF_REGIONS=<relative directory of PerfRegions>
if PerfRegions is moved Other options: - use preprocessing macro
DEBUG=1
inmain.F90
if you are debugging
The right way to compile CUDA code with OpenMP is with nvidia's HPC SDK as other solutions lack reliable support as of 2024, due to the proprietary nature of CUDA (see IDRIS at page http://www.idris.fr/media/formations/openacc/openmp_gpu_idris_c.pdf)
See installation guide at https://docs.nvidia.com/hpc-sdk//hpc-sdk-install-guide/index.html and https://developer.nvidia.com/hpc-sdk-downloads
If you have trouble installing the right cuda and cuda-drivers, go see https://forums.developer.nvidia.com/t/ubuntu-install-specific-old-cuda-drivers-combo/214601/5
An example compilation is as follows :
nvfortran -mp=gpu -gpu=sm_75 -o foo foo.F90
Replace sm_75
in -gpu=sm_75
by your gpu's name (see https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)
- In
make
, use the following lines :
NVARCH=Linux_x86_64
NVRELEASE=24.5
export NVARCH
NVCOMPILERS=/opt/nvidia/hpc_sdk
export NVCOMPILERS
export MANPATH:=$(MANPATH):$(NVCOMPILERS)/$(NVARCH)/$(NVRELEASE)/compilers/man
export PATH:=$(NVCOMPILERS)/$(NVARCH)/$(NVRELEASE)/compilers/bin:$(PATH)
export PATH:=/usr/local/cuda:/usr/local/cuda/bin:$(PATH)
export LD_LIBRARY_PATH:=/usr/local/cuda:/usr/local/cuda/lib:$(LD_LIBRARY_PATH)
Change the NVRELEASE line as well as the NVARCH as needed.
Thank you to Hugo Brunie (hbrunie) and Martin Schreiber (schreiberx) for all the irreplaceable guidance. perf_regions is a work of Martin Schreiber included here as a source folder for ease of use and copying source in code generation to ensure correct type of Fortran Modules are compiled each time.