Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement distributed training using horovod #3533

Merged
merged 4 commits into from
Mar 15, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions doc/TRAINING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,21 @@ python3 DeepSpeech.py --train_files ./train.csv --dev_files ./dev.csv --test_fil

On a Volta generation V100 GPU, automatic mixed precision speeds up DeepSpeech training and evaluation by ~30%-40%.

Distributed training using Horovod
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you have a capable compute architecture, we offer the opportunity to distribute the training using `Horovod <https://github.com/horovod/horovod>`_. A fast network is recommended.
lissyx marked this conversation as resolved.
Show resolved Hide resolved
Horovod is capable of using MPI and NVIDIA's NCCL for highly optimized inter-process communication.
It also offers Gloo as an easy-to-setup communication backend.
lissyx marked this conversation as resolved.
Show resolved Hide resolved

For more information about setup or tuning of Horovod please visit `Horovod's Github <https://github.com/horovod/horovod>`_.

To train on 4 machines using 4 GPUs each:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are there any requirements:

  • same GPUs ?
  • same OS ?
  • same drivers ?
  • same number of GPUs on each system ?

Copy link
Contributor Author

@NanoNabla NanoNabla Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, Horovod is expected to run on highly heterogeneous systems, it is not documented well what works.
I would not support something else than different GPUs per machine.This can be controlled by the number of processes per host, since every process has only one GPU pinned to it.
horovodrun -np SUM_OF_PROCNUM -H server1:NUMPROC_1,server2:NUMPROC_2,server3 [...]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add link to Horovod doc covering those aspects ? And articulate that we cannot support anything besides homogenous configurations and that heterogenous cases will have to deal with their mess ? :)


.. code-block:: bash

horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python3 DeepSpeech.py --train_files [...] --horovod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth linking some official stable crash-course "how to use horovod" here ?


Checkpointing
^^^^^^^^^^^^^

Expand Down
10 changes: 10 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ def main():
'tensorflow == 1.15.4'
]

horovod_pypi_dep = [
'horovod'
lissyx marked this conversation as resolved.
Show resolved Hide resolved
]

# Due to pip craziness environment variables are the only consistent way to
# get options into this script when doing `pip install`.
tc_decoder_artifacts_root = os.environ.get('DECODER_ARTIFACTS_ROOT', '')
Expand All @@ -94,6 +98,12 @@ def main():
else:
install_requires = install_requires + tensorflow_pypi_dep

if os.environ.get('DS_NOHOROVOD', ''):
lissyx marked this conversation as resolved.
Show resolved Hide resolved
install_requires = install_requires
else:
install_requires = install_requires + horovod_pypi_dep


setup(
name='deepspeech_training',
version=version,
Expand Down
Loading