describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124

gdementen · 2024-11-29T16:29:02Z

>>> arr = ndtest((3, 4))
>>> arr.describe('a', 'b0,b1 >> b01;b1,b2 >> b12')
>>> arr.describe_by('b0,b1 >> b01;b1,b2 >> b12')
b\statistic  count  mean  std  min   25%  50%   75%   max
        b01    6.0   4.5  0.0  0.0  2.25  4.5  6.75   9.0
        b12    6.0   5.5  0.0  1.0  3.25  5.5  7.75  10.0

The correct result should be:

b\statistic  count  mean                 std  min   25%  50%   75%   max
        b01    6.0   4.5  3.6193922141707713  0.0  1.75  4.5  7.25   9.0
        b12    6.0   5.5  3.6193922141707713  1.0  2.75  5.5  8.25  10.0

For example:

>>> arr['b0,b1'].describe()
statistic  count  mean                 std  min   25%  50%   75%  max
             6.0   4.5  3.6193922141707713  0.0  1.75  4.5  7.25  9.0

I think this is related to #1118. In fact, I think these are two different symptoms for the same bug: because of the way group aggregates and axis aggregates are not done at the same time.

gdementen · 2024-11-29T16:37:46Z

The good news is the issue is already solved in my big refactor branch... The bad news is that branch still needs a lot of work to be ready.

gdementen added bug priority: very high labels Nov 29, 2024

gdementen added this to the 0.35 milestone Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124

describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124

gdementen commented Nov 29, 2024

gdementen commented Nov 29, 2024

describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124

describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124

Comments

gdementen commented Nov 29, 2024

gdementen commented Nov 29, 2024