Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124

Open
gdementen opened this issue Nov 29, 2024 · 1 comment

Comments

@gdementen
Copy link
Contributor

>>> arr = ndtest((3, 4))
>>> arr.describe('a', 'b0,b1 >> b01;b1,b2 >> b12')
>>> arr.describe_by('b0,b1 >> b01;b1,b2 >> b12')
b\statistic  count  mean  std  min   25%  50%   75%   max
        b01    6.0   4.5  0.0  0.0  2.25  4.5  6.75   9.0
        b12    6.0   5.5  0.0  1.0  3.25  5.5  7.75  10.0

The correct result should be:

b\statistic  count  mean                 std  min   25%  50%   75%   max
        b01    6.0   4.5  3.6193922141707713  0.0  1.75  4.5  7.25   9.0
        b12    6.0   5.5  3.6193922141707713  1.0  2.75  5.5  8.25  10.0

For example:

>>> arr['b0,b1'].describe()
statistic  count  mean                 std  min   25%  50%   75%  max
             6.0   4.5  3.6193922141707713  0.0  1.75  4.5  7.25  9.0

I think this is related to #1118. In fact, I think these are two different symptoms for the same bug: because of the way group aggregates and axis aggregates are not done at the same time.

@gdementen
Copy link
Contributor Author

The good news is the issue is already solved in my big refactor branch... The bad news is that branch still needs a lot of work to be ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant