Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is quite a big deal. Until now polars/arrow memory was completely immutable. If we did and append, we simply added an array chunk to the list of chunks (sort of a linked list). This yielded very fast appends, but is detrimental for query performance, because the chunks add a lot of indirection.
Especially use cases where you have rows coming in on a very slow pace and you want to do querys between the updates. For instance in online learning cases, Polars was not the right tool for the job, as you would need to call a
rechunk
to get optimal peformance which is a complete reallocation of your table! Very expensive.With this change, we can now extend the
DataFrame/Series
and write to the same memory allocations. This might still reallocate, but given exponential growth strategies, this operation is amortizedO(1)
.There is of course no magic. We can only write to the same memory iff