Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix too strict assertion in shuffle code for pandas subclasses #8667

Merged

Conversation

jorisvandenbossche
Copy link
Member

I am testing the p2p shuffle in dask-geopandas (geopandas/dask-geopandas#295), and this fix was needed to get it running.
Pandas does not require that the _constructor_sliced attribute for subclasses is a class (type) itself, it should just be a callable to construct the subclassed Series object (that is also how it is used here: it's only used on the line below as a callable worker_for = constructor(worker_for)). So this assertion is wrong (and not necessary anyway IMO, if this attribute would do something wrong, that's a problem with the subclass in general)

I haven't added a test for now (it would require to add a dummy pandas / dask collection subclass boilerplate (given you don't want to depend on an external package like dask-geopandas I think), but that seems a bit much for this simple change).

@hendrikmakait hendrikmakait self-requested a review June 3, 2024 15:25
Copy link
Contributor

github-actions bot commented Jun 3, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    29 files  ±    0      29 suites  ±0   11h 20m 2s ⏱️ + 1h 29m 32s
 4 054 tests  -     5   3 954 ✅ +   10     97 💤  -   9  3 ❌  - 2 
55 841 runs  +7 583  53 677 ✅ +7 355  2 160 💤 +247  4 ❌  - 2 

For more details on these failures, see this check.

Results for commit 5032d66. ± Comparison against base commit cbc21df.

This pull request removes 13 and adds 8 tests. Note that renamed tests count towards both.
distributed.protocol.tests.test_arrow
distributed.protocol.tests.test_collection
distributed.protocol.tests.test_highlevelgraph
distributed.protocol.tests.test_numpy
distributed.protocol.tests.test_pandas
distributed.shuffle.tests.test_graph
distributed.shuffle.tests.test_merge
distributed.shuffle.tests.test_merge_column_and_index
distributed.shuffle.tests.test_metrics
distributed.shuffle.tests.test_rechunk
…
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler_report_args[False]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler_report_args[report_args0]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[1]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[False]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[True]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers_report_args[False]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers_report_args[report_args0]

♻️ This comment has been updated with latest results.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jorisvandenbossche. This change looks reasonable to me, but mypy isn't happy anymore. Could you try to fix this?

@jorisvandenbossche
Copy link
Member Author

Ah, the assert was there just to satisfy the type checker ..

I added a type: ignore, but could also add a cast(constructor, Callable) or something like that if that is preferred

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the assert was there just to satisfy the type checker ..

I don't know if that was its sole purpose, but it looks like that was at least a helpful side-effect!

type: ignore works for me here. Thanks for adjusting!

@hendrikmakait hendrikmakait merged commit 25f0732 into dask:main Jun 4, 2024
28 of 34 checks passed
@jorisvandenbossche jorisvandenbossche deleted the fix-shuffle-pandas-subclass branch June 4, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants