limit the number of running fragments in one node #44119

mingmxu · 2024-04-15T17:25:23Z

Feature request

Is your feature request related to a problem? Please describe.
During troubleshooting query failures in spill_mode, it's not supported to limit the number of running fragments in one node. As a results, it's not able to constrain the total memory one query would consume.

Describe the solution you'd like
By adding a conf fragment_parallel_per_node, the scheduler would stop dispatching workload to a node when it's lot-full. The scheduler can either route it to another node, or wait.

Describe alternatives you've considered
NA

Additional context

The text was updated successfully, but these errors were encountered:

kangkaisen · 2024-04-16T05:54:13Z

Thanks. we will consider it.

ZiheLiu · 2024-04-16T06:05:00Z

Thanks a lot. We will consider how to implement it.

Could you offer more information about this case? such as the following concepts

How many queries are running concurrently?
How memory a query consumed in a BE when triggering spill mode or not triggering spill mode?
How long does a query takes?

mingmxu · 2024-04-16T20:47:34Z

Thanks a lot. We will consider how to implement it.

Could you offer more information about this case? such as the following concepts

How many queries are running concurrently?

During the e2e tests, queries run one-by-one. In the future we expect concurrency > 1 in production;

How memory a query consumed in a BE when triggering spill mode or not triggering spill mode?

The resource_group has 100GB memory capacity per node. The spill conf is spill_mode: auto, spill_mem_limit_threshold=0.01, pipeline_dop=2

How long does a query takes?

most queries in seconds, some might cost close to 60_s.

Happy to have a chat to discuss further, we could include @haoan(sry I don't have his github handler) to this thread

stdpain · 2024-07-04T13:42:53Z

#47868 will solve this problem

stdpain · 2024-07-04T13:46:45Z

For UNION more such queries will reduce a lot of memory. For joins it depends on the type of query. (Join has no significant effect on deep left trees), but it does have some effect on complex queries (multiple shuffle joins).

mingmxu added the type/feature-request label Apr 15, 2024

kangkaisen assigned ZiheLiu Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit the number of running fragments in one node #44119

limit the number of running fragments in one node #44119

mingmxu commented Apr 15, 2024

kangkaisen commented Apr 16, 2024

ZiheLiu commented Apr 16, 2024

mingmxu commented Apr 16, 2024

stdpain commented Jul 4, 2024

stdpain commented Jul 4, 2024

limit the number of running fragments in one node #44119

limit the number of running fragments in one node #44119

Comments

mingmxu commented Apr 15, 2024

Feature request

kangkaisen commented Apr 16, 2024

ZiheLiu commented Apr 16, 2024

mingmxu commented Apr 16, 2024

stdpain commented Jul 4, 2024

stdpain commented Jul 4, 2024