Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for in-place operations on tables #116

Open
rofinn opened this issue Aug 1, 2019 · 12 comments
Open

Better support for in-place operations on tables #116

rofinn opened this issue Aug 1, 2019 · 12 comments

Comments

@rofinn
Copy link
Member

rofinn commented Aug 1, 2019

Specifically, it'd be nice if I could use some traits to determine whether I can mutate the underlying data during row or column iteration (e.g., mutating values in DataFrameRow).

@quinnj
Copy link
Member

quinnj commented Aug 7, 2019

Ok, I've been noodling on this for.......6 days (haha, actually longer, because people have brought it up on slack and stuff). @rofinn can you talk a little more about the use-case you have in mind for this? I have some ideas, but most of mine end in "oh, this actually wouldn't be useful for the most part", but I want to hear a solid case where someone wants to use it and how it would be helpful. Anyway, I can try to put some of my thoughts together, but in the mean time, I thought I'd ask for some more info from your side.

@rofinn
Copy link
Member Author

rofinn commented Aug 7, 2019

My use case is in Impute.jl where I'm trying to mutate data in-place if possible by applying some operation over each column.

function impute!(table, imp::Imputor)
    istable(table) || throw(MethodError(impute!, (table, imp)))

    # Extract a columns iterator that we should be able to use to mutate the data.
    # NOTE: Mutation is not guaranteed for all table types, but it avoids copying the data
    columntable = Tables.columns(table)

    for cname in propertynames(columntable)
        impute!(getproperty(columntable, cname), imp)
    end

    return table
end

https://github.com/invenia/Impute.jl/blob/master/src/imputors.jl#L155

In this code, the passed in table will only sometimes mutate the data depending on table type passed in. It'd be nice if I could check that calling Tables.columns will allow me to mutate the underlying data and throw a warning if it doesn't.

@Drvi
Copy link

Drvi commented Sep 8, 2019

Hi,

for my usecase -- trying to get Selections.jl easily available to the ecosystem -- I'd like to be have select() and select!() functions, both could de-select columns and for mutable datasources, I'd like to provide the inplace variant for efficiency. This would require some way of signaling mutability (Tables.ismutable?) and providing a way of deletion of columns (Tables.deleteat!) as well as their reordering (like permutecols!). What @rofinn describes also seems useful to me.

@quinnj
Copy link
Member

quinnj commented Feb 8, 2020

With #131, we're committing to enhancing the Tables.jl interface a bit, but also trying to keep it very minimal, to encourage adoption. As I've thought of this and a few other related issues, I think it would make sense to have a MutableTables.jl package (or maybe called InMemoryTables.jl). It turns out there are a lot of things like this that people want to do, but that really apply to a stricter subset of "table types" that allow mutation and can be manipulated (or indexed, or sorted, etc.). So in my mind, it's possible we could define something in Tables.jl, but it feels a bit off because Tables.jl is trying to be so generic (though admittedly not as generic as TableTraits.jl). That's why I think it'd be useful to have a separate package that could use Tables.jl, but also define additional interface requirements for various table manipulations. Thoughts @Drvi , @bkamins , @nalimilan , @davidanthoff , @rofinn , @iamed2 , @andyferris ?

@bkamins
Copy link
Member

bkamins commented Feb 8, 2020

I think that there are three levels of this mutability, and we should be explicit at which level we target:

  1. allowing to change some values in the table without resizing it (setindex!, sort!, ...)
  2. allowing to change number of rows (but keeping schema fixed)
  3. allowing to change number of columns, names of columns, eltype of columns

@tpapp
Copy link
Contributor

tpapp commented Feb 8, 2020

I think a separate package, which enhances the interface, would be the best approach for now, since it would allow experimentation with the mutable interface without affecting the API defined in this package (cf #133).

@andyferris
Copy link
Member

@bkamins I would be tempted to try make these three seperate/orthogonal interfaces for perming different mutations, rather than “levels” or layers with some on top of the others.

E.g. I’m imagining you could have 3 without 2 (data frame of static arrays) or 2 without 1 (functional programmers like to think of “append only” databases).

@bkamins
Copy link
Member

bkamins commented Feb 8, 2020

Sure - they are largely orthogonal. I have this order in the back of my head, as it is natural in DataFrames.jl, but for other data structures clearly it is the way you say 😄.

A particular cases is that 3 assumes allowing "replacing" of the column it is not the same as 1, which mostly assumes updating column in-place (however, for some data structures 1 would imply replacement - when in order to setindex! you would have to replace a column because it is immutable, but 1 would guarantee that eltype after replacement does not change).

@andyferris
Copy link
Member

andyferris commented Feb 8, 2020

Yes it’s very interesting how mutating a column behaves somewhat the same as mutating the rows. Of course, you can tell the difference when you have access to the column references.

The way I always imagined this playing out is (a) have two APIs/traits for mutation and insertion into data structures (and “upsert” for data structures that support both, this is the way it is done in Dictionaries.jl), and (b) have table modelled as a nested data structure (a relation is a collection of rows). All the different cases you mention simply fall out naturally.

@rofinn
Copy link
Member Author

rofinn commented Feb 26, 2020

On a related note, should Tables.jl have a similar fallback like the arrays interface where folks can be guaranteed to be returned a mutable table? That would simplify the code posted above at the cost of potentially inconsistent return types.

@juliohm
Copy link

juliohm commented Jul 30, 2023

I wonder if there was any progress regarding experimentation of a trait system for mutable tables? At this point in time Tables.jl is the defacto standard for tables in Julia, and we are reaching applications where mutability and a basic setindex! would be great.

@juliohm
Copy link

juliohm commented Sep 19, 2024

Any progress here? Or any draft somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants