Extend in place! 💯 #2544

ritchie46 · 2022-02-04T14:31:46Z

This PR is quite a big deal. Until now polars/arrow memory was completely immutable. If we did and append, we simply added an array chunk to the list of chunks (sort of a linked list). This yielded very fast appends, but is detrimental for query performance, because the chunks add a lot of indirection.

Especially use cases where you have rows coming in on a very slow pace and you want to do querys between the updates. For instance in online learning cases, Polars was not the right tool for the job, as you would need to call a rechunk to get optimal peformance which is a complete reallocation of your table! Very expensive.

With this change, we can now extend the DataFrame/Series and write to the same memory allocations. This might still reallocate, but given exponential growth strategies, this operation is amortized O(1).

There is of course no magic. We can only write to the same memory iff

we are the only owner (The series are not shared with another dataframe)
the memory is not allocated by pyarrow

ritchie46 · 2022-02-04T14:32:38Z

@jorgecarleitao @houqp FYI

houqp · 2022-02-05T06:00:13Z

very cool!

jorgecarleitao · 2022-02-05T06:14:10Z

Brutal. Always innovating!

fyi @wesm @pitrou @kou @andygrove. This uses copy on write - it checks at runtime whether we are the only owners of the array and, if yes, we take exclusive mutable ownership of the buffer / array.

This was proposed by @sundy-li here jorgecarleitao/arrow2#741, inspired by what clickhouse is doing and implemented by @ritchie46 here: jorgecarleitao/arrow2#794

ritchie46 added 3 commits February 4, 2022 12:30

extend primitive

54bb74f

more extend

554d571

extend series

eea2710

github-actions bot added the rust Related to Rust Polars label Feb 4, 2022

ritchie46 force-pushed the extend branch from 79552ab to c72276a Compare February 4, 2022 14:34

github-actions bot added the python Related to Python Polars label Feb 4, 2022

ritchie46 force-pushed the extend branch from c72276a to dfa9bec Compare February 4, 2022 14:39

dispatch

64c8bc6

ritchie46 force-pushed the extend branch from dfa9bec to 64c8bc6 Compare February 4, 2022 15:02

ritchie46 merged this pull request into master Feb 4, 2022

ritchie46 deleted the extend branch February 4, 2022 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend in place! 💯 #2544

Extend in place! 💯 #2544

ritchie46 commented Feb 4, 2022 •

edited

Loading

ritchie46 commented Feb 4, 2022

houqp commented Feb 5, 2022

jorgecarleitao commented Feb 5, 2022 •

edited

Loading

Extend in place! 💯 #2544

Extend in place! 💯 #2544

Conversation

ritchie46 commented Feb 4, 2022 • edited Loading

ritchie46 commented Feb 4, 2022

houqp commented Feb 5, 2022

jorgecarleitao commented Feb 5, 2022 • edited Loading

ritchie46 commented Feb 4, 2022 •

edited

Loading

jorgecarleitao commented Feb 5, 2022 •

edited

Loading