Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrameGroupBy, RollingGroupby, ExpandingGroupby #36

Open
hermian opened this issue Jan 29, 2023 · 4 comments
Open

Implement DataFrameGroupBy, RollingGroupby, ExpandingGroupby #36

hermian opened this issue Jan 29, 2023 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@hermian
Copy link

hermian commented Jan 29, 2023

What you were trying to do (and why)

The problem occurs when the apply function is called after applying groupby to the data frame.

What happened (including reproducible example)

Reproducible example

def square(x):
    return x * x

df = pd.DataFrame({'A': 'a a b'.split(),
                   'B': [1,2,3],
                   'C': [4,6,5]})

df.groupby('A')[['B', 'C']].mapply(square)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/workspace/git/screener/mmt/kospi_naver_db.py in <cell line: 1>()
----> 1 df.groupby('A', group_keys=True)[['B', 'C']].mapply(square)

/opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/mapply/mapply.py in mapply(df_or_series, func, axis, n_workers, chunk_size, max_chunks_per_worker, progressbar, args, **kwargs)
     95 
     96     n_chunks = _choose_n_chunks(
---> 97         df_or_series.shape,
     98         opposite_axis,
     99         n_workers,

/opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __getattr__(self, attr)
    979             return self[attr]
    980 
--> 981         raise AttributeError(
    982             f"'{type(self).__name__}' object has no attribute '{attr}'"
    983         )

AttributeError: 'DataFrameGroupBy' object has no attribute 'shape'

@hermian hermian added the bug Something isn't working label Jan 29, 2023
@ddelange
Copy link
Owner

ddelange commented Jan 29, 2023

Hi @hermian 👋

Indeed, mapply currently only implements DataFrame and Series.

I think there are a number of edge cases where DataFrameGroupBy.apply produces different kinds of output (depending on keyword arguments, input shape, and function output shape).

For the naive approach, you can iterate over a DataFrameGroupBy to get df_or_series, and do mapply stuff with that:

pd.concat(df.mapply() for df in data.groupby())

But as mentioned, I think this only works for the 'obvious' cases where groupby() produces sub dataframes or series (you might have to set axis on concat and/or mapply to achieve the same behaviour).

I think pandarallel has a complete implementation also for RollingGroupby and ExpandingGroupby. But last time I checked, there is only 1 chunk per worker and no way to configure that. So if one chunk takes much longer than the another chunks, in the end you'll have all cores (but one) idle, waiting for the last chunk to finish.

In the hope they have exhaustive test cases, it might be worth a shot to port those test cases and do some test driven development here.

PRs are welcome!

@ddelange ddelange changed the title Problems applying apply to groupby objects Implement DataFrameGroupBy, RollingGroupby, ExpandingGroupby Jan 29, 2023
@ddelange ddelange added enhancement New feature or request help wanted Extra attention is needed and removed bug Something isn't working labels Jan 29, 2023
@ddelange
Copy link
Owner

ddelange commented Mar 5, 2023

DataFrameGroupBy done in #40

RollingGroupby and ExpandingGroupby have different mechanics ref and need another approach

@juanPabloMiceli
Copy link

About this, while using DataFrameGroupBy.mapply() I got:

mapply/_groupby.py:112: FutureWarning: DataFrameGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  original_apply = getattr(df_or_series.grouper, attr)
mapply/_groupby.py:113: FutureWarning: DataFrameGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  setattr(df_or_series.grouper, attr, MethodType(apply, df_or_series.grouper))
mapply/_groupby.py:117: FutureWarning: DataFrameGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  setattr(df_or_series.grouper, attr, original_apply)

I am not adding an example because any DataFrameGroupBy.mapply() will throw those warnings.
Should I open a new issue?

@ddelange
Copy link
Owner

ddelange commented Jul 4, 2024

Hey 👋 I've seen them too. It's a warning for pandas v3, so it'll be part of #68 👍 For now I can suppress the warnings since mapply pins pandas<3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants