Make all imputation methods consistent in regard to encoding requirements #827

nicolassidoux · 2024-11-24T09:32:38Z

See #824 for the details of the change.

for more information, see https://pre-commit.ci

Zethson

Thank you very very much! This is great. I am a fan of simplifying the code (by removing semi important features such as the progress bars in general) and really think that generalizing the imputation tests is an awesome idea.

I have a few questions and concerns though:

I am not a fan of adding more _utils_* modules. I argue that all and any utils modules are a code smell. It is never clear to developers what is actually in there and it should probably be somewhere else. Let's just have descriptive module names (or scripts) that better describe what can be found in there.
I don't know how I feel about the removal of the imputation spinners. We had added them because some imputation functions can run for a looooong time and it just gives feedback that the code is still running. I know - it's not a progress bar but it's at least something? I don't have the strongest opinion on this but wonder what other people think @Imipenem @eroell
Removing some docstrings about raised Errors is okay with me because the errors themselves can document the behavior when raised. The docstrings are more useful for other developers if they want to build on top because they don't have to check the whole code for any potential raised Errors.
Instead of printing to stdout and returning a bool in _base_check_imputation I wonder whether this function should just raise errors? prints are not for errors - they are for informative messages. Pytest will resolve the stack and point to the part that raised the error.

ehrapy/_utils_available.py

ehrapy/_utils_data.py

ehrapy/preprocessing/_imputation.py

tests/preprocessing/test_imputation.py

Co-authored-by: Lukas Heumos <[email protected]>

@Zethson

@Zethson Apply suggestions from code review part 2

for more information, see https://pre-commit.ci

nicolassidoux · 2024-11-25T10:14:54Z

I am not a fan of adding more _utils_* modules. I argue that all and any utils modules are a code smell. It is never clear to developers what is actually in there and it should probably be somewhere else. Let's just have descriptive module names (or scripts) that better describe what can be found in there.

Well, this is always a big topic. In a way you're right, having these utils is a hint something could be better organized. But it doesn't mean it is in our control. For example: _is_val_missing is a function that would better fit in AnnData itself. So I would consider those as "extensions" more than "utils".

I don't know how I feel about the removal of the imputation spinners. We had added them because some imputation functions can run for a looooong time and it just gives feedback that the code is still running. I know - it's not a progress bar but it's at least something? I don't have the strongest opinion on this but wonder what other people think @Imipenem @eroell

I never saw it spinning 😆 But maybe it's an issue with my stdout.

Removing some docstrings about raised Errors is okay with me because the errors themselves can document the behavior when raised. The docstrings are more useful for other developers if they want to build on top because they don't have to check the whole code for any potential raised Errors.

Again a bit topic 😄 As I said, it sounds like a good idea, but almost not maintenable in practice. From my experience.

Instead of printing to stdout and returning a bool in _base_check_imputation I wonder whether this function should just raise errors? prints are not for errors - they are for informative messages. Pytest will resolve the stack and point to the part that raised the error.

This is what I would have done for production code, but usually for tests, I like that kind of output. But alright, I'll modify that.

for more information, see https://pre-commit.ci

nicolassidoux · 2024-11-25T10:36:16Z

Looks the Test / test (ubuntu-latest, 3.11, --pre) (pull_request) fails...

eroell · 2024-11-25T11:02:47Z

Looks the Test / test (ubuntu-latest, 3.11, --pre) (pull_request) fails...

Nothing to do with us, its a dependency & they're already on it :)

eroell · 2024-11-25T11:07:01Z

I don't know how I feel about the removal of the imputation spinners. We had added them because some imputation functions can run for a looooong time and it just gives feedback that the code is still running. I know - it's not a progress bar but it's at least something? I don't have the strongest opinion on this but wonder what other people think @Imipenem @eroell

I'm split... if we were to keep it, I'd move it to a decorator? That would make the functions way more readable, and reduce redundancy

I never saw it spinning 😆 But maybe it's an issue with my stdout.

You run usually scripts if I recall correctly? If you run it from a jupyter notebook, on this 6 dots on the image, you can see something like a snake walking around for the spinning display

Zethson

For example: _is_val_missing is a function that would better fit in AnnData itself. So I would consider those as "extensions" more than "utils".

Then @nicolassidoux they should be in https://github.com/theislab/ehrapy/blob/main/ehrapy/anndata/anndata_ext.py right? I still expect a lot of that to eventually make it into @eroell ehrdata but for now it should probably be there.

Could you please move the new functions?

I'm okay with the spinners being removed unless you deem them worthwhile now @nicolassidoux and want to write a decorator.

After the functions have been moved, I'd love to merge this.

Thank you very much!

ehrapy/preprocessing/_imputation.py

ehrapy/_utils_data.py

ehrapy/preprocessing/_imputation.py

ehrapy/_settings.py

ehrapy/_utils_data.py

nicolassidoux · 2024-11-25T11:54:04Z

@Zethson

Then @nicolassidoux they should be in https://github.com/theislab/ehrapy/blob/main/ehrapy/anndata/anndata_ext.py right? I still expect a lot of that to eventually make it into @eroell ehrdata but for now it should probably be there.

Oh well I should have noticed this file indeed 😄

I'm okay with the spinners being removed unless you deem them worthwhile now @nicolassidoux and want to write a decorator.

Let me assess how much work it is to make that done.

ehrapy/preprocessing/_imputation.py

tests/preprocessing/test_imputation.py

for more information, see https://pre-commit.ci

nicolassidoux · 2024-11-25T13:44:55Z

Hm of course all notebooks tests or almost fail because of yaspin missing. This spinner is nice because it works for both scripts and notebook. What do you guys think?

Zethson · 2024-11-25T13:51:23Z

Rich is much much more popular and also used in packages such as pip itself. Therefore, users will have it anyways. That's why I think it'd be better if we stuck to Rich?

for more information, see https://pre-commit.ci

nicolassidoux · 2024-11-25T14:19:33Z

Ok, it doesn't work well on my stdout, but ok in notebooks.

for more information, see https://pre-commit.ci

nicolassidoux · 2024-11-25T14:38:53Z

@eroell @Zethson Ready for final review, I think

eroell · 2024-11-25T15:54:37Z

Spinners work, cool. Thanks for making the decorator!

Nice to have these additional tests, aim of the PR is fulfilled.

I have one final point on the return type. Before, all imputation methods returned a new adata if copy=True, else None. This is the "scverse"/"scanpy" style behavior. miceforest was an (I suppose accidental) exception, which always returned the adata.

Right now, it has been changed to all imputation methods returning adata. I suggest to rather have all functions following the "scverse"/"scanpy" style instead.

Even if this could mean a small breaking change for users doing adata = ep.pp.mice_forest_impute(adata, copy=False), the fix being to just do ep.pp.mice_forest_impute(adata, copy=False), I think going consistent here now helps sharpening our API for the future.

If you agree with this or convince me otherwise, this looks ready to merge for me!

nicolassidoux · 2024-11-25T20:08:37Z

@eroell

I have one final point on the return type.

Fine for me, anyway I advocated for removing this copy thing, but that's a separate debate.
I'll modify that right now so the issue can be closed ASAP.

for more information, see https://pre-commit.ci

nicolassidoux · 2024-11-25T21:01:12Z

Done, I also noticed miss_forest_impute really needed a rewrite.

nicolassidoux · 2024-11-25T21:06:56Z

Hmm pre-commit doesn't like there is no return statement if copy=False. Should i modify like:

    ...
    if copy:
        return adata
    else:
        return

Zethson · 2024-11-25T21:19:37Z

Hmm pre-commit doesn't like there is no return statement if copy=False. Should i modify like:
    ...
    if copy:
        return adata
    else:
        return

Also how scanpy does it: https://github.com/scverse/scanpy/blob/main/src/scanpy/tools/_dpt.py#L199C5-L199C35

eroell · 2024-11-25T21:34:26Z

Thanks for the final polish here - Like your suggestion, or like scanpy does it, both fine!
Then this looks good to merge for me :)

nicolassidoux · 2024-11-25T22:08:48Z

Done!

Zethson · 2024-11-25T22:27:49Z

The RTD build seems to time out because of dependencies somehow urgh

Zethson · 2024-11-25T22:28:00Z

I'll merge this now and I'll take a look myself soon.

nicolassidoux added 2 commits November 15, 2024 17:10

Before test

fe5c005

After tests

d06ba21

nicolassidoux requested review from Zethson and eroell November 24, 2024 09:32

nicolassidoux linked an issue Nov 24, 2024 that may be closed by this pull request

Make all imputation methods consistent in regard to encoding requirements #824

Closed

[pre-commit.ci] auto fixes from pre-commit.com hooks

5d05b8c

for more information, see https://pre-commit.ci

Zethson requested changes Nov 24, 2024

View reviewed changes

nicolassidoux and others added 3 commits November 25, 2024 09:40

Apply suggestions from code review part 1

07722eb

Co-authored-by: Lukas Heumos <[email protected]>

@nicolassidoux

3133fb2

@Zethson Apply suggestions from code review part 2

[pre-commit.ci] auto fixes from pre-commit.com hooks

e253600

for more information, see https://pre-commit.ci

nicolassidoux and others added 2 commits November 25, 2024 11:30

Updated _base_check_imputation to throw exception

48be991

[pre-commit.ci] auto fixes from pre-commit.com hooks

dddb9bc

for more information, see https://pre-commit.ci

nicolassidoux requested a review from Zethson November 25, 2024 10:36

Zethson requested changes Nov 25, 2024

View reviewed changes

eroell requested changes Nov 25, 2024

View reviewed changes

eroell reviewed Nov 25, 2024

View reviewed changes

ehrapy/preprocessing/_imputation.py Outdated Show resolved Hide resolved

eroell reviewed Nov 25, 2024

View reviewed changes

ehrapy/preprocessing/_imputation.py Show resolved Hide resolved

eroell reviewed Nov 25, 2024

View reviewed changes

ehrapy/preprocessing/_imputation.py Outdated Show resolved Hide resolved

eroell reviewed Nov 25, 2024

View reviewed changes

tests/preprocessing/test_imputation.py Outdated Show resolved Hide resolved

nicolas.sidoux and others added 2 commits November 25, 2024 14:33

Added spinner support

1d34e12

[pre-commit.ci] auto fixes from pre-commit.com hooks

ae4bd5f

for more information, see https://pre-commit.ci

nicolas.sidoux and others added 2 commits November 25, 2024 14:57

After @eroell review

b9da31f

[pre-commit.ci] auto fixes from pre-commit.com hooks

393a88e

for more information, see https://pre-commit.ci

nicolas.sidoux and others added 2 commits November 25, 2024 15:16

Changed spinner to Rich

d5da251

[pre-commit.ci] auto fixes from pre-commit.com hooks

4397ca9

for more information, see https://pre-commit.ci

nicolas.sidoux and others added 3 commits November 25, 2024 15:21

Fixed missing import

3d30e6b

[pre-commit.ci] auto fixes from pre-commit.com hooks

4cebe33

for more information, see https://pre-commit.ci

After @eroell review

1ec728f

Zethson approved these changes Nov 25, 2024

View reviewed changes

nicolassidoux and others added 2 commits November 25, 2024 21:54

Updated returns in imputation, rewrote miss_forest_impute

f517d71

[pre-commit.ci] auto fixes from pre-commit.com hooks

5507bd0

for more information, see https://pre-commit.ci

eroell approved these changes Nov 25, 2024

View reviewed changes

Fixed imputation returns

be2f2e4

Zethson merged commit 67fedbf into main Nov 25, 2024
15 of 17 checks passed

Zethson deleted the enhancement/issue-824 branch November 25, 2024 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make all imputation methods consistent in regard to encoding requirements #827

Make all imputation methods consistent in regard to encoding requirements #827

nicolassidoux commented Nov 24, 2024

Zethson left a comment

nicolassidoux commented Nov 25, 2024 •

edited

Loading

nicolassidoux commented Nov 25, 2024

eroell commented Nov 25, 2024

eroell commented Nov 25, 2024 •

edited

Loading

Zethson left a comment

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

Zethson commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

eroell commented Nov 25, 2024 •

edited

Loading

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

Zethson commented Nov 25, 2024

eroell commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

Zethson commented Nov 25, 2024

Zethson commented Nov 25, 2024

Make all imputation methods consistent in regard to encoding requirements #827

Make all imputation methods consistent in regard to encoding requirements #827

Conversation

nicolassidoux commented Nov 24, 2024

Zethson left a comment

Choose a reason for hiding this comment

nicolassidoux commented Nov 25, 2024 • edited Loading

nicolassidoux commented Nov 25, 2024

eroell commented Nov 25, 2024

eroell commented Nov 25, 2024 • edited Loading

Zethson left a comment

Choose a reason for hiding this comment

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

Zethson commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

eroell commented Nov 25, 2024 • edited Loading

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

Zethson commented Nov 25, 2024

eroell commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024

Zethson commented Nov 25, 2024

Zethson commented Nov 25, 2024

nicolassidoux commented Nov 25, 2024 •

edited

Loading

eroell commented Nov 25, 2024 •

edited

Loading

eroell commented Nov 25, 2024 •

edited

Loading