You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The below query requires 65M memory to run, if we set memory limit to 50M, it can not run successfully
Run in datafusion-cli:
cargo run -- --mem-pool-type fair -m 50M -c "
select t1.v1, sum(t2.v1)
from
unnest(generate_series(1,1000)) as t1(v1)
, unnest(generate_series(1,1000)) as t2(v1)
group by t1.v1, t2.v1"
Error: External error: Resources exhausted: Failed to allocate additional 47616 bytes for GroupedHashAggregateStream[0] with 3995896 bytes already allocated for this reservation - 4031073 bytes remain available for the total pool
The issue is when doing sort-merge memory usage is over-estimated
For example, a RecordBatch with 3 arrays, arrays are sharing the same buffers, record_batch.get_array_memory_size() will estimate 3X actual memory consumption.
(The original RecordBatches passing through datafusion operators don't share Buffer between different columns, but in spilling queries, RecordBatches are first written to disk and read back, then it will reuse Buffers among different column arrays)
The root cause is already reported in arrow-rsapache/arrow-rs#6363
Once it's fixed in the arrow we should check if this aggregation query can run successfully, and also add tests.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
The below query requires 65M memory to run, if we set memory limit to 50M, it can not run successfully
Run in datafusion-cli:
The issue is when doing sort-merge memory usage is over-estimated
datafusion/datafusion/physical-plan/src/sorts/builder.rs
Line 72 in f2da32b
For example, a RecordBatch with 3 arrays, arrays are sharing the same buffers,
record_batch.get_array_memory_size()
will estimate 3X actual memory consumption.(The original
RecordBatch
es passing through datafusion operators don't shareBuffer
between different columns, but in spilling queries,RecordBatch
es are first written to disk and read back, then it will reuseBuffer
s among different column arrays)The root cause is already reported in
arrow-rs
apache/arrow-rs#6363Once it's fixed in the arrow we should check if this aggregation query can run successfully, and also add tests.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: