Skip to content

Latest commit

 

History

History
453 lines (336 loc) · 16.3 KB

install.md

File metadata and controls

453 lines (336 loc) · 16.3 KB
title
Installation cheat sheet for Kokkos

Kokkos install cheat sheet

  1. title: Installation cheat sheet for Kokkos
  2. Requirements
    1. Compiler
    2. Build system
  3. How to build Kokkos
    1. As part of your application
    2. As an external library
      1. Configure, build and install Kokkos
      2. Use in your code
    3. As a Spack package
  4. Kokkos compile options
    1. Host backends
    2. Device backends
    3. Specific options
    4. Architecture-specific options
      1. Host architectures
        1. AMD CPU architectures
        2. ARM CPU architectures
        3. Intel CPU architectures
      2. Device architectures
        1. AMD GPU architectures (HIP)
        2. Intel GPU architectures (SYCL)
        3. NVIDIA GPU architectures (CUDA)
    5. Third-party Libraries (TPLs)
    6. Examples for the most common architectures
      1. Local CPU with OpenMP
      2. AMD MI250 GPU with HIP and OpenMP
      3. NVIDIA A100 GPU with CUDA and OpenMP
      4. NVIDIA V100 GPU with CUDA and OpenMP
      5. Intel GPU Max/Ponte Vecchio GPU with SYCL and OpenMP

Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Compiling.html

Doc https://kokkos.org/kokkos-core-wiki/building.html

Doc https://github.com/kokkos/kokkos-tutorials/blob/main/LectureSeries/KokkosTutorial_01_Introduction.pdf

Requirements

Compiler

Compiler Minimum version Notes
ARM Clang 20.1
Clang 10.0.0 For CUDA
Clang 8.0.0 For CPU
GCC 8.2.0
Intel Classic 19.0.5
Intel LLVM 2022.0.0 For SYCL
Intel LLVM 2021.1.1 For CPU
MSVC 19.29
NVCC 11.0
NVHPC/PGI 22.3
ROCM 5.2.0

Build system

Build system Minimum version Notes
CMake 3.25.2 For Intel LLVM full support
CMake 3.21.1 For NVHPC support
CMake 3.18 For better Fortran linking
CMake 3.16

Doc https://kokkos.org/kokkos-core-wiki/requirements.html

How to build Kokkos

As part of your application

add_subdirectory(path/to/kokkos)
target_link_libraries(
    my-app
    Kokkos::kokkos
)
cd path/to/your/code
cmake -B build \
    -DCMAKE_CXX_COMPILER=<your C++ compiler> \
    <Kokkos compile options>

Code Code example:

As an external library

Configure, build and install Kokkos

cd path/to/kokkos
cmake -B build \
    -DCMAKE_CXX_COMPILER=<your C++ compiler> \
    -DCMAKE_INSTALL_PREFIX=path/to/kokkos/install \
    <Kokkos compile options>
cmake --build build
cmake --install build

Doc https://kokkos.org/kokkos-core-wiki/building.html

Use in your code

find_package(Kokkos REQUIRED)
target_link_libraries(
    my-app
    Kokkos::kokkos
)
cd path/to/your/code
cmake -B build \
    -DCMAKE_CXX_COMPILER=<your C++ compiler> \
    -DKokkos_ROOT=path/to/kokkos/install

Doc https://cmake.org/cmake/help/latest/guide/tutorial/index.html

As a Spack package

TODO finish this part

Doc See https://kokkos.org/kokkos-core-wiki/building.html#spack

Kokkos compile options

Host backends

Option Backend
-DKokkos_ENABLE_SERIAL=ON Serial
-DKokkos_ENABLE_OPENMP=ON OpenMP
-DKokkos_ENABLE_THREADS=ON Threads

Warning The serial backend is enabled by default if no other host backend is enabled.

Device backends

Option Backend Device
-DKokkos_ENABLE_CUDA=ON CUDA NVIDIA
-DKokkos_ENABLE_HIP=ON HIP AMD
-DKokkos_ENABLE_SYCL=ON SYCL Intel

Warning You can only select the serial backend, plus another host backend and one device backend at a time.

See architecture-specific options.

Specific options

Option Description
-DKokkos_ENABLE_DEBUG=ON Activate extra debug features, may increase compile times
-DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=ON Use bounds checking, will increase runtime
-DKokkos_ENABLE_EXAMPLES=ON Build examples
-DKokkos_ENABLE_TUNING=ON Create bindings for tuning tools
Extra options
Option Description
-DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON Aggressively vectorize loops
-DKokkos_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK=ON Debug check on dual views
-DKokkos_ENABLE_DEPRECATED_CODE=ON Enable deprecated code
-DKokkos_ENABLE_LARGE_MEM_TESTS=ON Perform extra large memory tests

Doc For more, see https://kokkos.org/kokkos-core-wiki/keywords.html

Architecture-specific options

Host architectures

Host options are used for controlling optimization and are optional.

Option Architecture
-DKokkos_ARCH_NATIVE=ON Local host
AMD CPU architectures
Option Architecture
-DKokkos_ARCH_ZEN3=ON Zen3
-DKokkos_ARCH_ZEN2=ON Zen2
-DKokkos_ARCH_ZEN=ON Zen
ARM CPU architectures
Option Architecture
-DKokkos_ARCH_ARMV9_GRACE=ON Grace
-DKokkos_ARCH_A64FX=ON ARMv8.2 with SVE Support
-DKokkos_ARCH_ARMV81=ON ARMV8.1
-DKokkos_ARCH_ARMV80=ON ARMV8.0
Intel CPU architectures
Option Architecture
`-DKokkos_ARCH_SPR=ON Sapphire Rapids
`-DKokkos_ARCH_SKX=ON Skylake
`-DKokkos_ARCH_BDW=ON Intel Broadwell
`-DKokkos_ARCH_HSW=ON Intel Haswell
`-DKokkos_ARCH_KNL=ON Intel Knights Landing
`-DKokkos_ARCH_SNB=ON Sandy Bridge
RISC-V CPU architectures
Option Architecture
-DKokkos_ARCH_RISCV_RVA22V=ON RVA22V

Device architectures

Device options are mandatory. They can be deduced from the device if present at CMake configuration time.

AMD GPU architectures (HIP)
Option Architecture Associated cards
-DKokkos_ARCH_AMD_GFX942_APU=ON GFX942 APU MI300A
-DKokkos_ARCH_AMD_GFX942=ON GFX942 MI300X
-DKokkos_ARCH_AMD_GFX90A=ON GFX90A MI210, MI250, MI250X
-DKokkos_ARCH_AMD_GFX908=ON GFX908 MI100
-DKokkos_ARCH_AMD_GFX906=ON GFX906 MI50, MI60
-DKokkos_ARCH_AMD_GFX1103=ON GFX1103 Ryzen 8000G, Radeon 740M, 760M, 780M, 880M, 980M
-DKokkos_ARCH_AMD_GFX1100=ON GFX1100 7900xt
-DKokkos_ARCH_AMD_GFX1030=ON GFX1030 V620, W6800
Option Description
-DKokkos_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS=ON Instantiate multiple kernels at compile time, improves performance but increases compile time
-DKokkos_ENABLE_HIP_RELOCATABLE_DEVICE_CODE=ON Enable Relocatable Device Code (RDC) for HIP
Intel GPU architectures (SYCL)
Option Architecture
-DKokkos_ARCH_INTEL_GEN=ON Generic JIT
-DKokkos_ARCH_INTEL_XEHP=ON Xe-HP
-DKokkos_ARCH_INTEL_PVC=ON GPU Max/Ponte Vecchio
-DKokkos_ARCH_INTEL_DG1=ON Iris XeMAX
-DKokkos_ARCH_INTEL_GEN12=ON Gen12
-DKokkos_ARCH_INTEL_GEN11=ON Gen11
Option Description
-DKokkos_ENABLE_SYCL_RELOCATABLE_DEVICE_CODE=ON Enable Relocatable Device Code (RDC) for SYCL
NVIDIA GPU architectures (CUDA)
Option Architecture CC Associated cards
-DKokkos_ARCH_HOPPER90=ON Hopper 9.0 H200, H100
-DKokkos_ARCH_ADA89=ON Ada 8.9 GeForce RTX 40 series, RTX 6000/5000 series, L4, L40
-DKokkos_ARCH_AMPERE86=ON Ampere 8.6 GeForce RTX 30 series, RTX A series, A40, A10, A16, A2
-DKokkos_ARCH_AMPERE80=ON Ampere 8.0 A100, A30
-DKokkos_ARCH_TURING75=ON Turing 7.5 T4
-DKokkos_ARCH_VOLTA72=ON Volta 7.2
-DKokkos_ARCH_VOLTA70=ON Volta 7.0 V100
-DKokkos_ARCH_PASCAL61=ON Pascal 6.1 P6, P40, P4
-DKokkos_ARCH_PASCAL60=ON Pascal 6.0 P100
-DKokkos_ARCH_MAXWELL53=ON Maxwell 5.3
-DKokkos_ARCH_MAXWELL52=ON Maxwell 5.2 M6, M60, M4, M40
-DKokkos_ARCH_MAXWELL50=ON Maxwell 5.0 M10
-DKokkos_ARCH_KEPLER37=ON Kepler 3.7 K80
-DKokkos_ARCH_KEPLER35=ON Kepler 3.5 K40, K20
-DKokkos_ARCH_KEPLER32=ON Kepler 3.2
-DKokkos_ARCH_KEPLER30=ON Kepler 3.0 K10

Doc See NVIDIA documentation on Compute Capability (CC): https://developer.nvidia.com/cuda-gpus

Option Description
-DKokkos_ENABLE_CUDA_CONSTEXPR Activate experimental relaxed constexpr functions
-DKokkos_ENABLE_CUDA_LAMBDA Activate experimental lambda features
-DKokkos_ENABLE_CUDA_LDG_INTRINSIC Use CUDA LDG intrinsics
-DKokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE Enable relocatable device code (RDC) for CUDA

Third-party Libraries (TPLs)

Doc See https://kokkos.org/kokkos-core-wiki/keywords.html#third-party-libraries-tpls

Examples for the most common architectures

Current CPU with OpenMP

cmake \
    -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DKokkos_ARCH_NATIVE=ON \
    -DKokkos_ENABLE_OPENMP=ON

AMD MI250 GPU with HIP and OpenMP

cmake \
    -B build \
    -DCMAKE_CXX_COMPILER=hipcc \
    -DCMAKE_BUILD_TYPE=Release \
    -DKokkos_ENABLE_HIP=ON \
    -DKokkos_ARCH_AMD_GFX90A=ON \
    -DKokkos_ENABLE_OPENMP=ON

NVIDIA A100 GPU with CUDA and OpenMP

cmake \
    -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DKokkos_ENABLE_CUDA=ON \
    -DKokkos_ARCH_AMPERE80=ON \
    -DKokkos_ENABLE_OPENMP=ON

NVIDIA V100 GPU with CUDA and OpenMP

cmake \
    -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DKokkos_ENABLE_CUDA=ON \
    -DKokkos_ARCH_VOLTA70=ON \
    -DKokkos_ENABLE_OPENMP=ON

Intel GPU Max/Ponte Vecchio GPU with SYCL and OpenMP

cmake \
    -B build \
    -DCMAKE_CXX_COMPILER=icpx \
    -DCMAKE_BUILD_TYPE=Release \
    -DKokkos_ENABLE_SYCL=ON \
    -DKokkos_ARCH_INTEL_PVC=ON \
    -DKokkos_ENABLE_OPENMP=ON \
    -DCMAKE_CXX_FLAGS="-fp-model=precise"

Last option is for math operators precision.

Code For more code examples: