Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Adding API for parallel block to task_arena to warm-up/retain/release worker threads #1522

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

pavelkumbrasev
Copy link
Contributor

@pavelkumbrasev pavelkumbrasev commented Oct 1, 2024

Adding API for parallel block to task_arena to warm-up/retain/release worker threads

Signed-off-by: pavelkumbrasev <[email protected]>
@vossmjp vossmjp changed the title Adding API for parallel block to task_arena to warm-up/retain/release worker threads [RFC} Adding API for parallel block to task_arena to warm-up/retain/release worker threads Oct 3, 2024
@vossmjp vossmjp changed the title [RFC} Adding API for parallel block to task_arena to warm-up/retain/release worker threads [RFC] Adding API for parallel block to task_arena to warm-up/retain/release worker threads Oct 3, 2024
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks as too certain about the things that will or will not happen when the new API is utilized. I think that the explanation should be written in a more vague terms using the more of "may", "might", etc. words. Essentially, conveying the idea that all this is up to the implementation and serve as a hint rather than a concrete behavior.

What do others think?

rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
@pavelkumbrasev
Copy link
Contributor Author

Overall, it looks as too certain about the things that will or will not happen when the new API is utilized. I think that the explanation should be written in a more vague terms using the more of "may", "might", etc. words. Essentially, conveying the idea that all this is up to the implementation and serve as a hint rather than a concrete behavior.

What do others think?

I tried to indicate that this set of APIs is a hint to the scheduler. But if you believe that we can relax this guarantees even more I think we should do this.

Signed-off-by: pavelkumbrasev <[email protected]>
@pavelkumbrasev
Copy link
Contributor Author

Ping @aleksei-fedotov @vossmjp @akukanov

@isaevil isaevil marked this pull request as ready for review November 20, 2024 14:45
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
isaevil and others added 2 commits November 25, 2024 10:53
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comment from my side. Have not reviewed the new API yet.

rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
rfcs/proposed/parallel_block_for_task_arena/README.md Outdated Show resolved Hide resolved
Comment on lines +115 to +117
### Proposed API

```cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please summarize the proposed API modifications before the code block?

  • Add a enumeration type for the arena leave policy
  • Add the policy as the last parameter to the arena constructors and initialization, defaulted to...
  • Add functions to start and end a parallel block to the class and the namespace
  • Add RAII class to map a parallel block to a code scope.

Comment on lines +109 to +111
* If work was submitted immediately after the end of the parallel block,
the default arena "workers leave" state will be restored.
* If the default "workers leave" state was the "Fast leave" the result is NOP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this two bullets there is no mentioning of the "workers leave" state. Therefore, it is ambiguous what is actually meant here. Consider the following confusing interpretation: Should I consider it as a separate state that is called "workers leave", meaning that the workers should leave as soon as possible or that it is rather a "workers leave" policy, meaning that the actual values behind this policy determines the behavior?

Does the following suggestion improve the situation?

Suggested change
* If work was submitted immediately after the end of the parallel block,
the default arena "workers leave" state will be restored.
* If the default "workers leave" state was the "Fast leave" the result is NOP.
* If work was submitted immediately after the end of the parallel block,
the default arena behavior with regard to "workers leave" policy is restored.
* If the default "workers leave" policy was the "Fast leave", the result is NOP.

Comment on lines +140 to +141
void start_parallel_block();
void end_parallel_block(bool set_one_time_fast_leave = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. To make the new API more composable I would indicate that the setting affects primarily this parallel block. While in the absence of other parallel blocks with conflicting requests, it affects the behavior of arena in a whole.
  2. It looks as if it is tailored to a single scenario. Along with the first bullet I believe this is the reason why there is that "NOP" thing.

Therefore, my suggestion is to address both of these by changing the API (here and in other places) to something like the following:

Suggested change
void start_parallel_block();
void end_parallel_block(bool set_one_time_fast_leave = false);
void start_parallel_block();
void end_parallel_block(workers_leave this_block_leave = workers_leave::delayed);

Then to add somewhere the explanation how this affects/changes the behavior of the current parallel block and how this composes with the arena's setting and other parallel blocks within it. For example, it may be like:

This start and end of parallel block API allows making one time change in the behavior of the arena setting with which it was initialized. If this behavior matches the arena's setting, then the workers' leave behavior does not change. In case of a conflicting requests coming from multiple parallel blocks simultaneously the scheduler chooses the behavior it considers optimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no composability problem really, as all but the last end-of-block calls are simply ignored, and only the last one has the one-time impact on the leave policy. Also it does not affect the arena settings, according to the design.

Of course if the calls come from different threads, in general it is impossible to predict which one will be the last. However, even if the code is designed to create parallel blocks in the same arena by multiple threads, all these blocks might have the same leave policy so that it does not matter which one is the last to end.

Using the same enum for the end of block as for the construction of the arena seems more confusing than helpful to me, as it may be perceived as changing the arena state permanently.

Comment on lines +140 to +141
void start_parallel_block();
void end_parallel_block(bool set_one_time_fast_leave = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the arena's workers_leave behavior and scoped_parallel_block both specified in the constructors, this change in behavior set at the end of a parallel block looks inconsistent.

Would it be better to have this setting be specified at the start of a parallel block rather than at its end?

Comment on lines +188 to +189
* What if different types of workloads are mixed in one application?
* What if there concurrent calls to this API?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above about making the approach to be a bit more generic. Essentially, I think we can write something like "implementation-defined" in case of a concurrent calls to this API. However, it seems to me that the behavior should be kind of relaxed, so to say. Meaning that if there is at least one "delayed leave" request happening concurrently with possibly a number of "fast leave" requests, then it, i.e., "delayed leave" policy prevails.

Also, having the request stated up front allows scheduler to know the runtime situation earlier, hence making better decisions about optimality of the workers' behavior.

Comment on lines +99 to +100
* Serves as a warm-up hint to the scheduler:
* Allows worker threads to be available by the time real computation starts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the thread warmup to always happen at the start of a block, or do we want to allow users having control over it?
Also, do we promise thread availability "by the time the real computation starts"? I do not think we do, because a) in case too little time has passed after the block start, threads might not yet come, and b) in case too much time has passed, threads might leave.
Maybe more accurate description is like "Allows reducing computation start delays by initiating the wake-up of worker threads in advance".

Comment on lines +119 to +122
enum class workers_leave : /* unspecified type */ {
fast = /* unspecifed */,
delayed = /* unspecifed */
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find better names to the enum class and its values.
I am specifically concerned about the use of "delayed" in case the actual behavior might be platform specific, not always delayed. But also workers_leave is not a very good name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking... Combined with your previous comment about automatic, perhaps we could have 3 modes instead:

  • automatic - platform specific default setting
  • fast (or any other more appropriate name)
  • delayed (or any other more appropriate name)

If we assume that we have these 3 modes now, fast and delayed modes would enforce behavior regardless of the platform. That would allow to use start/end_parallel_block on hybrid CPUs but if we set automatic instead, it would be translated to fast option for hybrid systems (to be aligned with current implementation of block time here https://github.com/uxlfoundation/oneTBB/blob/master/src/tbb/waiters.h#L61).
What do you think?

Comment on lines +140 to +141
void start_parallel_block();
void end_parallel_block(bool set_one_time_fast_leave = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More on the API names.

  • There might be alternative names to consider instead of "block" - e.g., "stage", "phase", "region". Personally, I like any of these three, and especially "phase", quite more than "block".
  • set_one_time_fast_leave is too verbose and, due to "set", suggests that it might have impact beyond the call. Something like with_fast_leave could be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants