Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit the number of running fragments in one node #44119

Open
mingmxu opened this issue Apr 15, 2024 · 5 comments
Open

limit the number of running fragments in one node #44119

mingmxu opened this issue Apr 15, 2024 · 5 comments
Assignees

Comments

@mingmxu
Copy link
Contributor

mingmxu commented Apr 15, 2024

Feature request

Is your feature request related to a problem? Please describe.
During troubleshooting query failures in spill_mode, it's not supported to limit the number of running fragments in one node. As a results, it's not able to constrain the total memory one query would consume.

Describe the solution you'd like
By adding a conf fragment_parallel_per_node, the scheduler would stop dispatching workload to a node when it's lot-full. The scheduler can either route it to another node, or wait.

Describe alternatives you've considered
NA

Additional context

@kangkaisen
Copy link
Collaborator

Thanks. we will consider it.

@ZiheLiu
Copy link
Contributor

ZiheLiu commented Apr 16, 2024

Thanks a lot. We will consider how to implement it.

Could you offer more information about this case? such as the following concepts

  • How many queries are running concurrently?
  • How memory a query consumed in a BE when triggering spill mode or not triggering spill mode?
  • How long does a query takes?

@mingmxu
Copy link
Contributor Author

mingmxu commented Apr 16, 2024

Thanks a lot. We will consider how to implement it.

Could you offer more information about this case? such as the following concepts

  • How many queries are running concurrently?

During the e2e tests, queries run one-by-one. In the future we expect concurrency > 1 in production;

  • How memory a query consumed in a BE when triggering spill mode or not triggering spill mode?

The resource_group has 100GB memory capacity per node. The spill conf is spill_mode: auto, spill_mem_limit_threshold=0.01, pipeline_dop=2

  • How long does a query takes?

most queries in seconds, some might cost close to 60_s.

Happy to have a chat to discuss further, we could include @haoan(sry I don't have his github handler) to this thread

@stdpain
Copy link
Contributor

stdpain commented Jul 4, 2024

#47868 will solve this problem

@stdpain
Copy link
Contributor

stdpain commented Jul 4, 2024

For UNION more such queries will reduce a lot of memory. For joins it depends on the type of query. (Join has no significant effect on deep left trees), but it does have some effect on complex queries (multiple shuffle joins).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants