The goal of our project is to develop optimized operators from the ONNX standard for the snitch architecture.
- Either:
- docker/podman
- or ("local")
- bender
- verilator
- riscv compiler
- dependencies of
snitch/python-requirements.txt
andsnitch/apt-requirements.txt
- cmake and gcc
Execute in the project root directory:
git submodule update --init
After installing everthing you can source env.sh
inside the project root directory:
source ./scripts/env.sh
This sets the following variables and aliases:
proot
to change the directory to the project rootdbuild
to build for banshee using dockerpbuild
to build for banshee using podmanbuild
to build for banshee locallydbuild_size SIZE
to build for banshee using docker with given input size for the benchmarkpbuild_size SIZE
to build for banshee using podman with given input size for the benchmarkbuild_size SIZE
to build for banshee locally with given input size for the benchmarkbuild_sim
to build for the simulator locally- Run
clean
before switching from a banshee build and vice versa!
- Run
build_sim_size
to build for the simulator locally with given input size for the benchmark- Run
clean
before switching from a banshee build and vice versa!
- Run
clean
to remove all temporary build filesrun
to run using banshee (which must be on your PATH)bench
to run all benchmarks (all binaries inside build/ that start withbenchmark_
)
- Only if you are not using
banshee
Make sure to not forget sourcing env.sh.
This needs only to be done once:
cd snitch/hw/system/snitch_cluster/
make bin/snitch_cluster.vlt
If you want to install the environment yourself and do not want to use docker. This allows you to use build
and build_size
.
First define $RISCV
to point to the directory you want to have the toolchain installed to.
export RISCV=$HOME/.local/llvm-riscv
curl -Ls --progress-bar -o riscv-llvm.tar.gz https://sourceforge.net/projects/pulp-llvm-project/files/nightly/riscv32-pulp-llvm-ubuntu2004.tar.gz/download
tar -C $RISCV -xf riscv-llvm.tar.gz --strip-components=1
for file in riscv64-*; do ln -s $file $(echo "$file" | sed 's/^riscv64/riscv32/g'); done
To build banshee you have to execute the following commands:
cd ./snitch/sw/banshee
cargo install --path .
- First run
source ./scripts/env.sh
if you have not done so already dbuild
for docker (orbuild
)- The build files should then be in
./build
- The build files should then be in
Now you can simulate the built applications with:
run ./build/hello_world
- SSR
- abs, acos, acosh, add, argmax, asinh, batchnorm, conv, conv2d, copy, cumsum, div, dot, dropout, gemm, masked_dropout, max, maxpool, maxpool2d, relu, sigmoid, sin, sum, transpose, unique
- FREP
- abs, add, batchnorm, conv, conv2d, copy, cumsum, div, dot, dropout, gemm, masked_dropout, max, maxpool, maxpool2d, relu, sigmoid, sin, sum, transpose
- Parallelised (w/o any helpers except barriers)
- abs, add, argmax, conv, gemm, sin, sum
- OMP
- add, add, gemm, sin, sum
Note: To use the python-benchmarker, you need banshee and python3 set up.
First install the python dependencies found in plots/requirements.txt
. We recommend doing this in a virtual environment.
To set up the virtual python environment, run the following commands from the project root:
python3 -m venv plots/.venv
source ./plots/.venv/bin/activate
pip install -r plots/requirements.txt
To run the benchmarks for some specific operater (e.g. for the abs-operator), you can execute the following:
python3 plots/scraper.py -include abs
You can also specify up to which size we double (We start at 10 and double each run). For example:
python3 plots/scraper.py -inclide abs -builder 'dbuild_size 40'
- This will build and run the sizes:
10, 20, 40
This script builds the project using docker (or any other build command from above using, i.e: -builder 'pbuild_size ZXY
), runs the benchmark using banshee and stores the measurements in a file for later use. Note that this might take a couple of minutes depending on the operator.
To view a runtime plot of the abs-operator which you have just benchmarked, run:
python3 plots/runtime_plot.py -include abs
If you want to exclude a plot line from the runtime plot, use the "-exclude" flag. For example:
python3 plots/runtime_plot.py -include abs -exclude frep
If you want to generate speedup plots for the different implementations, you can run:
python3 plots/speedup_plot.py -include abs -exclude frep
- gcc inline assembly
- onnx operators
- llvm fork
- ssr papter
- snitch paper
- snitch getting started
- RISCV registers
- RISCV spec
- Register saves
- plot some theoretical lower bounds
- calculate theoretical runtime
- parallelize
- conv2d, maxpool, maxpool2d, transpose, max, unique
- colors/plotstyle consistent
- comparison of theoretical runtime of problem relative to runtime on x86
- look into bug of multiple SSR configurations
- previous SSR configuration for writing was still active
- TEST core count configuration of SIMULATOR
- Use multiple cores
- argmax
- conv parallel
- fix batchnorm benchmark
- manually created as SSR is configured three times and this causes problems
- bar plot
- barrier wait indefinitely; reduce has "undefined symbol: __kmpc_reduce_nowait" compile error
- SSR+FREP
- abs, acos (no frep), acosh (no frep), add, argmax (no frep), asinh (no frep), batchnorm, copy, cumsum, div, dot, dropout, gemm, masked_dropout, max, maxpool, relu, sigmoid, sin, sum, transpose
- Parallel
- abs, add, copy, sin (no frep), sum, gemm
- OMP:
- abs, add, copy, sin (no frep), sum (broken due to SSR 'leaking' or wrong impl.), gemm
- Use memory start pointer instead of l1?
- Use start pointer; implemented in lmq.c
- L1 has a better latency than memory
- should we do it for all datatypes (uint8, uint16, float32, ...)?
- for
float
for now
- for
- ask if we need to accept n dimensional input
- Answer: Use vectors where possible
- Build for the simulator using a compile flag?
- use
-DCLUSTER_SIM=1
when callingcmake ..
or simply use the aliasbuild_sim
- use
- Have a command to output assembly
- can be done if compiled as executable. sufficient?
- setup building
- Find project and TA
- Created repo
- Created scraper scripts that runs benchmarks and dumps their runtime into json files
- Created scripts for automatically generating runtime plots