Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all communicators subscribe to info keys #13067

Open
devreal opened this issue Jan 28, 2025 · 10 comments
Open

Not all communicators subscribe to info keys #13067

devreal opened this issue Jan 28, 2025 · 10 comments

Comments

@devreal
Copy link
Contributor

devreal commented Jan 28, 2025

While reviewing #12189 I found what seems to be a discrepancy in the info key handling between communicators created with different functions. Some communicators subscribe to the info keys, some only copy them. When subscribed, the communicator will get notified of updates when the user calls MPI_Comm_set_info. Otherwise, no such update happens and we don't even parse the info keys. Some time ago we fixed the handling of mpi_assert_allow_overtake so that it can be set once after the communicator has been created (#9843) so that is a valid use-case.

Here is the list of functions that support changing info keys through MPI_Comm_set_info:

The list of functions that do not support it:

We should probably harmonize that in some way, i.e., always subscribe to the info keys. For internal communicators that will never have other info keys set we could pass a flag to suppress. However, it's strange that setting mpi_assert_allow_overtake on a communicator created with MPI_Comm_dup would work while it wouldn't work on a communicator created with MPI_Comm_create.

@edgargabriel
Copy link
Member

@devreal what about the dynamic communicator creation functions? Spawn, Join and Connect/Accept?

@devreal
Copy link
Contributor Author

devreal commented Jan 28, 2025

I don't see the subscriber mechanism used in https://github.com/open-mpi/ompi/blob/main/ompi/dpm/dpm.c

@edgargabriel
Copy link
Member

Shouldn't it be used however? Comm_spawn and friends take an info object as an argument as well

@devreal
Copy link
Contributor Author

devreal commented Jan 28, 2025

Yes, it is being used. The info object is queried for various info keys (starting at https://github.com/open-mpi/ompi/blob/main/ompi/dpm/dpm.c#L929) but as far as I can tell they are never added to the communicator so asking for them will not return anything. And of course things like allow-overtake seem to be dropped entirely.

I believe we have to make these keys available for query later so the application can see if we understood them.

@bosilca
Copy link
Member

bosilca commented Jan 28, 2025

  1. We are inconsistent across communicator creation functions. However, all functions that take the info key as an argument are consistent.
  2. The current behavior is consistent with the MPI standard requirements. We are free to ignore info keys in which case they should not be returned to the user.
  3. The info key for dynamic processing are not related to the communicator. MPI 4.1 Section 11.8 is very clear about this:

a set of key-value pairs telling the runtime system where and how to start the processes (handle, significant only at root)

@edgargabriel
Copy link
Member

@bosilca regarding Comm_spawn, MPI4.1 does make the explicit statement that the memory_alloc_kind info object can be passed to COmm_spawn as well, so we might need to add that ability:

In the World Model, an info hint passed to an MPI startup mechanism requests support for memory allocation kinds for all objects derived from the World Model. This info hint can also be supplied to MPI_COMM_SPAWN or MPI_COMM_SPAWN_MULTIPLE in the World Model.

@bosilca
Copy link
Member

bosilca commented Jan 28, 2025

I have a different understanding of that statement. Yes, the memkind info key applies to the newly created world but not to the intercomm resulting from the spawn call. If it gets attached somewhere, it shall be to the MPI_COMM_WORLD of the spawnees.

@devreal
Copy link
Contributor Author

devreal commented Jan 28, 2025

The current behavior is consistent with the MPI standard requirements. We are free to ignore info keys in which case they should not be returned to the user.

Yes. We can ignore them everywhere. Or flip a coin and pretend we don't know any of the keys. Neither is what our users expect.

@bosilca
Copy link
Member

bosilca commented Jan 28, 2025

we're not disagreeing there. I just pointed out that this is not a bug, or a divergence from the MPI standard requirements.

@edgargabriel
Copy link
Member

I have a different understanding of that statement. Yes, the memkind info key applies to the newly created world but not to the intercomm resulting from the spawn call. If it gets attached somewhere, it shall be to the MPI_COMM_WORLD of the spawnees.

We should probably bring this up in the hybrid working group to clarify the expected behavior. I am fine either way to be honest, its not like Comm_spawn is used thaat heavily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants