-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage of concat (large)utf8 #348
Conversation
Codecov Report
@@ Coverage Diff @@
## master #348 +/- ##
==========================================
+ Coverage 82.56% 82.61% +0.05%
==========================================
Files 162 162
Lines 44063 44197 +134
==========================================
+ Hits 36379 36512 +133
- Misses 7684 7685 +1
Continue to review full report at Codecov.
|
The MIRI failure is unrelated to this PR: #345 |
@ritchie46 do we have some (micro) benchmark results, like:
|
Performance wise it doesn't matter/ or hurt that much. So it mostly is more memory efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @ritchie46
Great improvement / idea.
Did some tests locally with 1025 items in the array, getting a small improvement in timing there too (3-5%).
Nice! |
MIRI test failure is not related to this PR (we subsequently disabled it until we can fix the errors) |
Thanks @ritchie46 |
I did not have the time to review this, but I think that this will over-allocate in all cases where In particular, IMO this overallocate in:
|
I did not realize this @jorgecarleitao. What do you think about creating a builder pattern (with an prealloc option) for the concat kernel case and return the |
* reduce memory needed for concat * reuse code for str allocation buffer
…se (#411) * Reduce memory usage of concat (large)utf8 (#348) * reduce memory needed for concat * reuse code for str allocation buffer * make sure that only concat preallocates buffers (#382) * MutableArrayData::with_capacities * better pattern matching * add binary capacities * add list child data * add struct capacities * add panic for dictionary type * change dictionary capacity enum variant Co-authored-by: Ritchie Vink <[email protected]>
partial solution for #347
This PR precomputes the needed length of the buffers that store the raw string data. This way we only allocate the minimal needed memory and it is faster because there will be no reallocation.