-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add getrows #284
add getrows #284
Conversation
This PR should probably provide appropriate implementations for Also probably it would be good to bump version number so that it can be released after merging. |
Codecov Report
@@ Coverage Diff @@
## main #284 +/- ##
==========================================
+ Coverage 94.88% 94.94% +0.06%
==========================================
Files 7 7
Lines 665 673 +8
==========================================
+ Hits 631 639 +8
Misses 34 34
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I'm totally unfamiliar with this package so I was hoping to just get the stub in to get things started. I can have a look around and see if I can manage to write those implementations as well. |
These should not be that hard. Of course we can do it later. @quinnj, @nalimilan - do you think we can have some default fallback implementation of this method? |
src/Tables.jl
Outdated
Return one or more rows from table `x` according to the position(s) specified by `inds`: | ||
|
||
- If `inds` is a single integer return a row object. | ||
- If `inds` is a collection of integers, return a table object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If `inds` is a collection of integers, return a table object. | |
- If `inds` is a collection of integers, return an indexable object of rows |
Just a bit more specific; one of the main motivations for this was to ensure users can get an in-memory/indexable collection of rows in a consistent way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we want to require Tables.getrows
to return a indexable collection of rows? That would mean that e.g. for a NamedTuple
of vectors getrows
couldn't return another NamedTuple
of vectors, even if that's the most efficient representation. IOW this is against the implementation that this PR adds for ColumnTable
. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start; thanks @CarloLucibello. Sorry I've been so slow on getting around to this or responding on the original issue. Just a little cleanup here and if you'll add some tests, I think it's ready to go. Once we merge, I can take a look at doing the implementation for the other table types in Tables.jl.
Done! Can you unlock CI? |
good to go? |
src/Tables.jl
Outdated
Return one or more rows from table `x` according to the position(s) specified by `inds`: | ||
|
||
- If `inds` is a single integer return a row object. | ||
- If `inds` is a collection of integers, return a table object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we want to require Tables.getrows
to return a indexable collection of rows? That would mean that e.g. for a NamedTuple
of vectors getrows
couldn't return another NamedTuple
of vectors, even if that's the most efficient representation. IOW this is against the implementation that this PR adds for ColumnTable
. :-)
Indeed, why not have EDIT: actually what I described is just calling |
@quinnj 🙏🏾 |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM; @bkamins, can I ask you to take one more look here before merging? Sorry to let this sit for so long; I've been tied up with a lot f stuff.
I have left one comment about allowed |
Co-authored-by: Bogumił Kamiński <[email protected]>
I agree with @nalimilan that it is better to throw an error - at least for the reference implementations (so users, by looking at the code can see what is expected). As I have commented above - I think the simplest thing to check is if the returned object is |
Ok, but here's an issue I've found digging into this further (and introducing the error checks mentioned); the current implementation of The point of this is we always get either a single row or collection of rows, right? Even if the input is column-oriented? Or should we update the docs and say if |
And indeed, @nalimilan pointed this exact issue out here. One idea is that we could provide a generic I might take a stab at this approach and see what it looks like. |
This is what I think we should do. For example for DataFrames.jl I imagine that we will have the following implementation:
as it seems most natural (and in general - it will be more efficient for a given table type to decide what to do; I assume that people will expect I would set the issue of "indexable collection of rows" and "indexable collection of columns" (for completeness as separate case) since it is orthogonal to the |
So given that we want this to be more of a general "subset" function, I'd like to change the name from I'm going to propose we call this |
Alternatively we could have 2 functions:
|
Ok, cleaned up implementation here: #292 |
|
Mmm. I'm not super happy with removing "rows" from the the name. If I'm new to tables in julia, "subset" could mean anything. I similarly never liked name like "select" . Select what? I want to know which axis I'm slicing on. How about |
But that's kind of the point of our discussion/decision above: this function isn't necessarily returning "rows" if the original table input is column-oriented; rather, it returns a "subset" of the original table, preserving whether the original table was column or row-oriented. Thus, |
Yes, I get that it is not returning "rows" in the sense of not returning a row iterator. But I'm not sure general users equate "rows" with "row iterator", maybe just Tables.jl developers do that 😉 . I also appreciate these name decisions are very difficult to get right and to get consensus on. Just putting in my two cents. And I agree that aligning with DataFrames - which most users will know well - makes sense. |
One could always go maximally explicit with something like |
as per comment #278 (comment)
@quinnj @bkamins @nalimilan @ablaom @ToucheSir