-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get values of grouped columns #1908
Conversation
Thank you for the commit. If we should add |
@nalimilan - maybe we could even consider storing the group infromation @jlumpe proposes when creating |
Thanks. I'm a bit hesitant about this API, since we currently never return arrays of (named) tuples: we rather use data frames for that. The |
Actually - if we followed this idea, we should add an argument to |
It would be weird to have an argument to I'm still not sure what's the best solution. The most appealing approach seems to be to define |
What we should do here depends if you want to make |
In the end I don't think we should make |
I think then it is OK to add |
052a145
to
0dbebed
Compare
Implemented feedback:
|
Also, I thin this would resolve #1693 |
Thank you for the fixes. I have left some things that should be discussed (especially with @nalimilan feedback) - so probably it is best to finish the design before you implement the changes. |
Implemented your feedback, this is now essentially just implementing the dictionary interface for
|
@jlumpe - thank you. I have left several minor comments and one major one regarding |
OK, full summary of the PR:
I think this is all pretty complete. There are a couple of reviews that it won't let me mark as resolved for some reason. I'm going to go over it once more and check for any corner cases left out of the tests, but I think it should be ready to merge. |
Looks good. I left a few minor comments and we can merge it. |
Ok, think that should resolve all remaining issues. |
The only thing left is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good. It has been a long journey, but the PR was really a major one.
Let us wait for @nalimilan to have final approve and then it can be merged.
There are a few uncovered lines according to Coveralls. Can you check whether they need tests? See https://coveralls.io/builds/27423710/source?filename=src%2Fgroupeddataframe%2Fgrouping.jl |
The display tests are usually put in test/show.jl. |
@jlumpe - can you please have a look at comments by @nalimilan. I want to have this PR i 0.20 and this is the last one scheduled for the release. Thank you. |
@bkamins Can you clarify what you mean about |
In order to include tests of @nalimilan - if you will not have additional comments I will merge this PR today. |
Ah - this is a very minor issue so I am OK to merge this PR without it (just please add it in the next PR you are planning to do related to this functionality). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jlumpe! It's been a long way but I think it was worth it.
@@ -16,7 +16,10 @@ by | |||
combine | |||
groupby | |||
groupindices | |||
groupvalues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groupvalues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know how to force-push this. I will remove this line in 0.20 PR as this function does not exist now
I noticed there doesn't seem to be a built-in mechanism to get the actual group values from a
GroupedDataFrame
, so I added agroupvalues()
function:Unfortunately,
GroupedDataFrame
doesn't seem to remember whether it was given a single column or a vector containing a single column, so this will always return a vector of tuples.In Pandas, iterating over the grouped object gives
(value, dataframe)
pairs, which I use very frequently. Of course you can always get the value from the first row of theSubDataFrame
when iterating, but I think this feature leads to much cleaner code. I definedBase.pairs(::GroupedDataFrame)
for this purpose: