Initiating a discussion on supporting more package variants #712

leofang · 2019-06-14T17:46:05Z

To make #486 (comment) a standalone issue. Text below are revised based on that comment.

First, some packages and libraries support (NVIDIA) GPUs. Taking the MPI libraries as an example, they can be "CUDA-aware" by passing the --with-cuda flag or alike to the configure script so that the MPI library is built and linked against CUDA driver and runtime libraries. At least Open MPI and MVAPICH support this feature.

(The purpose of doing so is to support (more or less) architecture-agnostic codes. For example, one can pass a GPU pointer to the MPI API without performing explicit data movement, and under the hood MPI will resolve it and recognize the data lives on GPU. Some low-level optimization for such operations is also implemented by the MPI vendors, such as direct inter-GPU communication bypassing the host and even collective number crunching on GPUs.)

Another example is tomopy, which supports MPI+GPU recently if I'm not mistaken. However, in our internal channel and conda-forge there is only CPU version. For some reason recent effort on updating the recipe didn't get merged (conda-forge/tomopy-feedstock#18). We should keep an eye on this.

Next, non-Python libraries (ex: HDF5, FFTW) can be built against MPI to provide asynchronous/parallel processing. Then, the corresponding Python wrappers (ex: h5py, PyFFTW, and mpi4py for MPI itself) need to be built against those specialized versions.

Taking all these into account, it means at the Conda level the number of package variants inflates quickly = (build against MPI yes or no?) * (# of available MPI libraries) * (CUDA-aware MPI yes or no?) * (# of supported CUDA toolkit versions, if requiring GPU support), and I am not sure what is the best strategy to handle this. (Use build string as unique id? Use different output names?) Too many degrees of freedom come into play, and so far we only fulfill the minimum requirement.

I feel that eventually a dedicated shell or Python script will be needed to help Conda resolve this issue, especially in the coming Jupyter-SDCC era, in which high-performance libraries may be favored. The meta.yaml recipe alone might not be enough. But I could be wrong.

The text was updated successfully, but these errors were encountered:

leofang · 2019-06-15T06:33:24Z

Just did a bit search. For h5py + MPI, this is conda-forge's solution:
https://github.com/conda-forge/h5py-feedstock/blob/master/recipe/meta.yaml
Not sure if we have room to chain more info in the build string though.

CJ-Wright · 2019-06-15T18:13:26Z

I would advise making use of the outputs key, to handle the downstream variants. See https://github.com/conda-forge/airflow-feedstock/blob/master/recipe/meta.yaml
I would also advise not taking the path that airflow took by writing out everything by hand. At that point I think using the jinja2 approach would be cleaner and less prone to errors.

Note that conda-froge doesn't build GPU versions of its code because we have no way to currently check the validity of the packages (with no GPUs to test on). We're working on a solution to this but I don't think we have a working framework for it yet. See this issue for conda forge gpu discussions: conda-forge/conda-forge.github.io#63

leofang · 2019-06-16T14:34:11Z

I would advise making use of the outputs key, to handle the downstream variants.

@CJ-Wright So you mean something like - name: {{ name }}-with-openmpi-cuda_aware-cuda91?

Note that conda-froge doesn't build GPU versions of its code because we have no way to currently check the validity of the packages (with no GPUs to test on). We're working on a solution to this but I don't think we have a working framework for it yet.

I know that cudatoolkit is currently not suitable for downstream packages to depend on. This is partly why I opened this issue: for the time being we need a homegrown solution for GPU support. Most likely, we should install latest CUDA toolkit in the docker image, and let nvcc build backward compatible CUDA binaries. @mrakitin thoughts?

leofang · 2019-06-16T14:35:40Z

btw, @CJ-Wright, why is output key better than build string?

mrakitin · 2019-06-16T15:50:08Z

I don't have a strong opinion on that topic as it's pretty new to me. Do we need real GPUs to use nvcc?

leofang · 2019-06-16T21:28:35Z

No. nvcc can be run without GPUs. For example, in the Institutional Cluster (part of SDCC) the submit machines do not have GPU, but we can build CUDA programs there and then submit GPU jobs. The key is to install CUDA toolkit in the default path (/usr/local/cuda/ in Linux).

CJ-Wright · 2019-06-17T00:02:36Z

Yes but I would do that as

- name: {{ name }}-{{ mpi_flag }}-{{ cuda_flag }}-{{ cuda_version}}

kind of thing (you'd need to work on that a little bit more but that is the basic gist).

I think this is a bit more explicit for users, since they ask for the exact thing that they want in the package name. Although the principle of jinja2 templating would be the same.

leofang · 2019-06-17T00:50:08Z

Yes yes I agree with you @CJ-Wright. I was thinking about the same approach but forgot about jinja.

leofang · 2019-06-24T14:51:18Z

After thinking about this a bit, I changed my mind and I'm in favor of the build string approach, because the output name approach would be too obscure for general users who just want to install the current default:

conda install h5py-nompi-nocuda-0

which should really just be conda install h5py as it is now.

For the record, h5py supports variants through build string, see https://github.com/conda-forge/h5py-feedstock/blob/master/recipe/meta.yaml. So, if one wants the MPI support, one just does

conda install h5py=*=mpi_openmpi*

otherwise with conda install h5py the nompi version is preferred (via setting a higher build number, @CJ-Wright why does this work?). This will not interfere with general needs and yet provides a way of customization for advanced users.

CJ-Wright · 2019-06-24T15:13:56Z

Higher build numbers are preferred, so conda will use the nompi unless you ask otherwise.

leofang · 2019-09-18T17:46:50Z

A GPU version of tomopy is added to conda-forge: conda-forge/tomopy-feedstock#25. I'd like to try that approach to resolve this issue.

leofang · 2019-09-18T17:54:29Z

Conda's support of CUDA detection: https://github.com/conda/conda/blob/0fd7941d545ef47930da10ea297b6c174050b1de/docs/source/user-guide/tasks/manage-virtual.rst

mrakitin · 2019-09-18T18:06:51Z

Yeah, saw it yesterday, wanted to let you know, @leofang, but you were faster :).

leofang · 2019-09-20T18:47:59Z

Conda-forge now has an official policy for MPI support: https://conda-forge.org/docs/maintainer/knowledge_base.html#message-passing-interface-mpi

leofang self-assigned this Jun 14, 2019

leofang changed the title ~~Initiating a discussion on support more package variants~~ Initiating a discussion on supporting more package variants Jun 14, 2019

leofang added this to the Undefined milestone Jun 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initiating a discussion on supporting more package variants #712

Initiating a discussion on supporting more package variants #712

leofang commented Jun 14, 2019 •

edited

Loading

leofang commented Jun 15, 2019

CJ-Wright commented Jun 15, 2019

leofang commented Jun 16, 2019 •

edited

Loading

leofang commented Jun 16, 2019

mrakitin commented Jun 16, 2019

leofang commented Jun 16, 2019

CJ-Wright commented Jun 17, 2019

leofang commented Jun 17, 2019

leofang commented Jun 24, 2019 •

edited

Loading

CJ-Wright commented Jun 24, 2019

leofang commented Sep 18, 2019

leofang commented Sep 18, 2019

mrakitin commented Sep 18, 2019

leofang commented Sep 20, 2019

Initiating a discussion on supporting more package variants #712

Initiating a discussion on supporting more package variants #712

Comments

leofang commented Jun 14, 2019 • edited Loading

leofang commented Jun 15, 2019

CJ-Wright commented Jun 15, 2019

leofang commented Jun 16, 2019 • edited Loading

leofang commented Jun 16, 2019

mrakitin commented Jun 16, 2019

leofang commented Jun 16, 2019

CJ-Wright commented Jun 17, 2019

leofang commented Jun 17, 2019

leofang commented Jun 24, 2019 • edited Loading

CJ-Wright commented Jun 24, 2019

leofang commented Sep 18, 2019

leofang commented Sep 18, 2019

mrakitin commented Sep 18, 2019

leofang commented Sep 20, 2019

leofang commented Jun 14, 2019 •

edited

Loading

leofang commented Jun 16, 2019 •

edited

Loading

leofang commented Jun 24, 2019 •

edited

Loading