This repo consists of usage examples of Nvidia DLProf on the Summit supercomputer.
DLProf is a tool for profiling deep learning models to help data scientists understand and improve performance of their models visually via Tensorboard or by analyzing text reports.
- use DLProf on Summit
module use /sw/aaims/summit/modulefiles
module load dlprof
- refer this blog for more details.
git clone --recursive https://github.com/at-aaims/dlprof-examples
cd dlprof-examples/DeepLearningExamples
git apply ../pytorch/ConvNets.patch
cd ../pytorch
bsub prof.lsf
- install tensorboard plugin (for x86 only)
pip install nvidia-pyindex
pip install nvidia-tensorboard
pip install nvidia-tensorboard-plugin-dlprof
- use pre-installed env on Andes
module load python
source activate /gpfs/alpine/world-shared/stf011/junqi/dlprof-env
tensorboard --logdir /gpfs/alpine/world-shared/stf011/junqi/dlprof-env/event_files --host localhost
- port forward to local machine
http://localhost:6006