-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polylithic(?) initialization of Kokkos #2658
Comments
There is an option to make this a compile time behavior. I.e. that Kokkos::initialize doesn't call the impl_initialize of each execution space. For this use case here as some other use cases where interoperability with larger runtimes is important (i.e. maybe HPX) this could be a valid option. On the other hand we did have a request for the option of not initializing CUDA in a build which has CUDA enabled too. |
Also I would potentially start with adding an "impl" option first. Lets see whether this does all the things we need it to do for Legion support and then go form there. |
Ok we discussed and do it via a different a initialize call in the Impl namespace. |
Do we need anything else than what is already provided by |
Thanks for moving this along, LANL is very interested in this. |
Would the code below have the desired effect? I have restructured initialize_internal (./impl/Kokkos_Core.cpp) into initialize_backends() and initialize_profiling(). Compiling and running the example shown further down seems to work for (cond = false;).
|
Yes, that looks reasonable to me. It wasn't clear to me whether it'd be more thematically consistent for Here is the change that I've been using successfully in a fork: streichler@8b87035
That may be equivalent at the moment, but my concern with this approach is that if anything else were ever to be added to Kokkos::initialize, this path would require additional changes (and awareness of the need for those changes) to avoid being broken. |
Fair enough. |
We see this as an experimental feature at the moment and we don't want to expose it in the |
OK we got this merged in now. We don't have tests in place yet, will wait for you to check this out and tell us your usage pattern. The relevant logic looks like this now: void initialize(int& narg, char* arg[]) {
InitArguments arguments;
Impl::parse_command_line_arguments(narg, arg, arguments);
Impl::parse_environment_variables(arguments);
Impl::initialize_internal(arguments);
}
void initialize(InitArguments arguments) {
Impl::parse_environment_variables(arguments);
Impl::initialize_internal(arguments);
}
void initialize_profiling(const InitArguments& args) {
#if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#else
if (getenv("KOKKOS_PROFILE_LIBRARY") != nullptr) {
std::cerr << "Kokkos::initialize() warning: Requested Kokkos Profiling, "
"but Kokkos was built without Profiling support"
<< std::endl;
}
#endif
}
void pre_initialize_internal(const InitArguments& args) {
if (args.disable_warnings) g_show_warnings = false;
}
void post_initialize_internal(const InitArguments& args) {
initialize_profiling(args);
g_is_initialized = true;
}
void initialize_internal(const InitArguments& args) {
pre_initialize_internal(args);
initialize_backends(args);
post_initialize_internal(args);
} So all in all someone needs to call:
And you can do the check of environment variables independently too. |
@streichler Can you folks check whether this works for you? |
Yes, I'll try to look at it early this week. |
I've modified Legion to use In order to generate a unit test for Kokkos, I think it'd be sufficient to do something like:
for each execution space between the pre/post_initialize calls. As a reminder, there's still one outstanding issue that must be addressed for CUDA+OpenMP execution in Kokkos-enabled Legion (or in the threaded initialization test proposed above): #2652 |
As described in #2651, Legion wants to initialize different Kokkos execution spaces from different threads. This clearly isn't possible with a single call to
Kokkos::initialize
, but Legion is almost able to make it work by directly calling theimpl_initialize
methods in each execution space (observing a few ordering requirements as it does so). The missing piece is thatKokkos::is_initialized()
continues to returnfalse
, which causes some things to get unhappy (e.g.:kokkos/core/src/impl/Kokkos_SharedAlloc.cpp
Lines 235 to 248 in e237df7
g_is_initialized
is hidden inside an anonymous namespace in Kokkos_Core.cpp and I can't get to it from Legion.Plausible remediations (one of which I strongly prefer) are:
g_is_initialized = true
. This is no more or less kosher than the calls toimpl_initialize
methods now, but presumes that there's no Kokkos initialization that isn't done by at least one of the execution space initializations.Kokkos::is_initialized
to be more precise about what it needs (e.g. theSharedAllocationRecord
being valid in the case above, if I understand the code correctly). At this point, you'd wonder why you hadKokkos::is_initialized
at all though.is_initialized
correctly in any thread (which would fix Unnecessary(?) check for host execution space initialization from Cuda initialization #2652 as well), and then callKokkos::initialize
after all theimpl_initialize
calls have been made.skip_execution_space_init
toKokkos::InitArguments
that does all init except for execution spaces. A caller that sets this would be expected to "know what they're doing" and then call the execution spaceimpl_initialize
methods after the call toKokkos::initialize
.Although any of these would technically work (and perhaps others may suggest additional approaches), my preference would be for (4). The first is a horrible hack (not that I'm above that!), and the second seems really fragile, and my concern with the third is that it will further strengthen the assumption in Kokkos right now that each execution space itself is "monolithic" (as opposed to, e.g. the Cuda execution space knowing there's two different GPUs, each of which might or might not be initialized yet).
The text was updated successfully, but these errors were encountered: