Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] generalize tbb's thread_pool_base #1403

Merged
merged 13 commits into from
Oct 9, 2024

Conversation

DavidElesTheFox
Copy link
Contributor

Why?

I would like to use the library with another thread pool than tbb. This PR contains a proposed implementation for this enhancement and solves some issues.

Current Issues

For this purpose the tbbexec::thread_pool_base looks like a perfect abstraction. But currently it has two issues:

  • It is in the tbb_thread_pool.hpp, therefore it has a dependency with tbb.
  • It has a friend declaration friend struct DerivedPoolType::tbb_thread_pool::scheduler; where the tbb_thread_pool actually bounding it to the tbb implementation and it is also unnecessary. I.e.: friend struct DerivedPoolType::scheduler; would be also good.

Proposal

My proposal to having a more generic pool hierarchy in the stdexec. Currently it looks like:

 tbbexec
    | - tbb_thread_pool.hpp

I would go with adding 3 implementation for 3rd party thread pools (3 because I like the Rule Of Three):

 execpools
  | - boost
  |    | - boost_thread_pool.hpp
  | - taskflow
  |    | - taskflow_thread_pool.hpp
  | - tbb
  |    | - tbb_thread_pool.hpp
  | - thread_pool_base.hpp

I choose these 3 3rd-party library, because I think these are the most popular threading libraries.

About the PR

  • I've tried to organize the commit properly, making easier to follow the changes.
  • I used the llvm18-cuda12.4 dev-container to test the changes.
  • I formatted the changed files with clang-format
  • Added two new cmake option: STDEXEC_ENABLE_BOOST, STDEXEC_ENABLE_TASK_FLOW. These should control the dependencies and fetches those when it is necessary.

Copy link

copy-pr-bot bot commented Aug 26, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mathisloge
Copy link

I would like to propose to rename the boost thread pool to asio_thread_pool and adding a cmake build flag to switch between the standalone asio and the boost variant

@DavidElesTheFox
Copy link
Contributor Author

DavidElesTheFox commented Sep 12, 2024

(EDITED)
I've rebased the PR and also implemented @mathisloge comment.

I have a bit of a feeling to adding the standalone ASIO adds a bit of a complexity to the whole architecture and I'm up for reverting if it causes maintenance issues. The source of the complexity:

  • Standalone Asio isn't based on CMake, thus importing it even with CPM is not so simple. (Needed to add a cmake script)
  • Because the asio implementation depends on the configuration a generated header file needed to be added (asio_config.h.in). Though the generated file is not part of the source tree It ads another thing what needs to be maintained.. This file is placed into the include/execpools/asio/asio_config.hpp. It is because this file is publicly included, thus needs to be part of the include library to not break the installations.

If you have any better idea to integrate the standalone asio let me know.

@ericniebler
Copy link
Collaborator

apologies for the delay. i'll get to this soon i hope.

DavidElesTheFox and others added 10 commits October 7, 2024 15:14
There were an extra name specifier. It was unnecessary and also tbb
specific.
Inorder to use later the thread_pool_base we need to separate it from
tbb
The goal is to have a common space for all the thread pools that depends
on 3rd party threading libraries, like tbb.

The new name is execpools like execution pools, where all thread pools
are collected.

tbb subdirectory is about separating the different implementation files
to manage easily the library dependencies.
The goal is to see that the task_pool_base abstraction is sufficient or
not.
The idea is that let's have an asio thread pool rather then a boost.
In this way the standalone asio can be also used as an implementation.

Which implementation is used for Asio is controlled by the cmake
parameter STDEXEC_ASIO_IMPLEMENTATION. It can be 'boost' or
'standalone'.

Cmake generates a configuration file that creates a namespace:
execpools::asio_impl. This points to the corresponding implementation.
In this way the pool implementation is referenced only to this
namespace.

The configuration file is generated into the build directory, inorder to
not pollute the source tree with generated files.
The generated header for asio was located in the build folder, but was
included from a public header. It makes the install impossible.

Now the file is generated into the include folder in the proper place
and added to the gitignore.
Copy link
Contributor

@BenFrantzDale BenFrantzDale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this! When I wrote the tbb thread pool (with the help of Eric and others), I factored it like I did with exactly this in mind: That fundamentally an execution context should just need an enqueue member function and that for the most part everything else will be shared across most thread-pool-like resources.

I left a few comments about my old questions when I wrote it that are still nagging me.

Comment on lines 23 to 30
//! This is a P2300-style thread pool wrapping tbb::task_arena, which its docs describe as "A class that represents an
//! explicit, user-managed task scheduler arena."
//! Once set up, a tbb::task_arena has
//! * template<F> void enqueue(F &&f)
//! and
//! * template<F> auto execute(F &&f) -> decltype(f())
//!
//! See https://spec.oneapi.io/versions/1.0-rev-3/elements/oneTBB/source/task_scheduler/task_arena/task_arena_cls.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like my tbb-specific comments should be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit it was one of the copy paste errors what I made unfortunately.
Correcting all occurrences.

template <class PoolType, class ReceiverId>
struct operation {
using Receiver = stdexec::__t<ReceiverId>;
struct __t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I keep seeing this pattern, along with stdexec::__t<Foo>. I see what it's doing, but it's not clear to me why, other than that __t<Foo> is nicer than typename Foo::__t. If someone could give an explanation, I'd gladly add a doxygen comment to

  template <class _Tp>
  using __t = typename _Tp::__t;

in __meta.hpp.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than that __t<Foo> is nicer than typename Foo::__t.

that's it, really. i don't like littering my code with typename everywhere. not a hill to die on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. With that in mind, why is it seemingly everywhere? Like

    class static_thread_pool_ {
      template <class ReceiverId>
      struct operation {
        using Receiver = stdexec::__t<ReceiverId>;
        class __t;
      };

That's saying that static_thread_pool_'s operation<ReceiverId> struct has a Receiver, which it gets from the ReceiverId, and then has a __t class which is the operation state? That gets used later to do template <class Receiver> using operation_t = stdexec::__t<operation<stdexec::__id<Receiver>>>; Why that indirection there? (If you can explain it, I'll add //! to relevant places!) To me it feels like a round-about way of saying that operation_t<Receiver> is the operation-state type: You get the __id of the receiver, get operation<that> which basically says using Receiver = __t<__id<Receiver>>;...

I'm sure the indirection is there for a reason, I just don't see it. Is there some more-universal pattern going on that this is just a simple case of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't misunderstand the question - so feel free to correct me if I do - which is why the indirection is made with the __t class and what is the reason behind this indirection. Then for me this doc pretty much explains it.
Briefly as I understand: It makes the class type less tight coupled from its template arguments (I.e.: It depends on the Id as argument not on the specific type)

Is it correct @ericniebler? Please correct me when it's wrong.

On the other hand I'm curious whether c++-modules will make changes on it one day or not, but it is just a personal curiosity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah. If there are no objections I’ll add a comment to __t with breadcrumbs pointing to that doc. Thank you!

return scheduler{static_cast<DerivedPoolType&>(*this)};
}

/* Is this even needed? Looks like no.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this comment in here I think... I didn't get an answer to the question and forgot to remove it.

: pool_(pool)
, rcvr_(std::move(rcvr)) {
this->__execute =
[](task_base* t, std::uint32_t /* tid What is this needed for? */) noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never got an answer to what the uint32 is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I checked the code I was also wondering. For me it looks like used during bulk tasks this is used to propagate/report the error in the proper into the proper thread. Thus, I thought I'm keeping this comment here, because I'm not 100% sure.

}

[[nodiscard]]
auto available_parallelism() const -> std::uint32_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't part of P2300 is it? It's just a nice-to-have, I think?

Copy link
Contributor Author

@DavidElesTheFox DavidElesTheFox Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any reference about it. But from my personal point of view it is nice to have but also I can imagine that not 'all' of the threading libraries has this information or easy to access. I.e.: In asio_thread_pool the _executor member is required only to fulfill this functional requirement.

Thus, if we removing it then the thread pool will be more generic.

Comment on lines +52 to +54
void enqueue(task_base* task, std::uint32_t tid = 0) noexcept {
arena_.enqueue([task, tid] { task->__execute(task, /*tid=*/tid); });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was writing this, it wasn't clear to me where the end of the P2300 concepts are and where things are specialized. I think (?) this pool.enqueue(task) isn't part of P2300, and is instead an implementation detail of the operation state for the sender for the pool's scheduler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree. I mean I also don't now and I also have the same opinion. Should I add a documentation about it somewhere to the structural description?

Copy link
Collaborator

@ericniebler ericniebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to leave a forwarding tbbexec/tbb_thread_pool.hpp header so we don't break people's builds. also, we need to consider that folks are currently linking to the STDEXEC::tbbexec target. that should continue working.

also, can thread_pool_base.hpp be used to simplify exec/static_thread_pool.hpp?

@ericniebler
Copy link
Collaborator

/ok to test

enqueue();
}
};
} // namespace execpools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good test/useage-example of this would be like an AnyThreadPool where tasks are submitted as std::function<void()>s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, more example makes it more usable! I just would like to questioning back, to clarify, if you don't mind.

The idea is to have an AnyThreadPool that based on the thread_pool_base but doesn't have a get_scheduler() public function to schedule tasks on it, but would have an enqueue(std::function<void()> callable) function how one can submit task to the pool. Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that. When I wrote this originally I was coming at it with the notion that a thread-based execution context must be modeled by something g you throw std::function<void()>s at and that it promises to execute eventually. (Or it could be void(*f)(void*)s and corresponding void* x as a pair and it promises to have a thread call f(x). That’s weaker and I think still fits.)

David Eles added 2 commits October 9, 2024 10:01
Implement review remarks
The goal is to stay compatible with existing usage.

Warning flag is added to notify users about the change.
@DavidElesTheFox
Copy link
Contributor Author

I've pushed some quick update on it:

  • removing the copy-paste errors on the base thread pool (tbb references, dead code)
  • on and transform got deprecated, I replaced these function usage on the tests
  • adding back the tbbexec header. @ericniebler I've also added a warning to the file. My idea is that everyone who uses the old header got a notification about the changes with this warning. I mean I like to be backward compatible but it is also nice if there is a way to remove this later on and doesn't leave in the repo forwever.

@ericniebler I think I can see the way how to use the same scheduler and sender in static_thread_pool and the other pools. So, these can have a common base class. I'm a bit worried about the details what I can't see right now (the possible tiny differences between static- and tbb/base- thread_pool, but theoretically it should work. I'm working on it to make this happen really soon.

@BenFrantzDale I'm gonna add as an example the AnyThreadPool I like the idea of having more example for the usage!

@ericniebler
Copy link
Collaborator

think I can see the way how to use the same scheduler and sender in static_thread_pool and the other pools. [...] I'm working on it to make this happen really soon.

no rush, that can be a separate PR.

@ericniebler
Copy link
Collaborator

/ok to test

@ericniebler ericniebler merged commit 4d8d194 into NVIDIA:main Oct 9, 2024
18 checks passed
@ericniebler
Copy link
Collaborator

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants