Feature/rank features group #546

VladimirShitov · 2023-06-14T11:37:15Z

PR Checklist

This comment contains a description of changes (with reason)
Referenced issue is linked: Add more statistical tests for comparing features #532
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes
Previously, for non-numerical features the standard statistical test was run by ep.tl.rank_features_groups (e.g. Wilcoxon rank sum test). This PR adds functionality to run statistical tests specifically developed for categorical features (e.g. Chi-square test).

Technical details

The same approach that in scanpy.tl.rank_genes_groups is used. E.g., when the reference is set to "rest", for each subgroup of groupby, the composition of a categorical variable is compared to the composition in all other groups mixed together. This is not a common approach, I would say, but it is consistent with scanpy, which is used for numerical features.
The default test is G-test, which is similar to the Chi-square test but should work better for groups with a small expected number of observations.
P-values should be treated carefully. I would only use them for ranking marker features, and re-run statistical analysis in a conventional way to test your hypotheses.

VladimirShitov · 2023-06-14T11:46:11Z

Note: function parameters are not validated as extensively as in scanpy. I might add this in coming commits

Zethson

You're a legend @VladimirShitov !

Left lots of minor comments
A file named _utils is usually a slight code smell because ideally ever piece of code should have a clear purpose. I would actually move ALL of the feature_ranks code including this into a _feature_ranks_groups.py. What do you think? IMO we have lots of customization now and this would make sense.

Thank you so much.

ehrapy/tools/_datatypes.py

ehrapy/tools/_scanpy_tl_api.py

tests/tools/test_features_ranking.py

VladimirShitov · 2023-06-19T13:45:20Z

Thank you for your comments, Lukas! Please, check the discussion on renaming datatypes.py above. Everything else is fixed

Signed-off-by: zethson <[email protected]>

VladimirShitov added 30 commits June 7, 2023 19:10

Run rank_features_groups() on categorical data

8c3eb0a

Adjust p-values

9322d95

Change string symbols

0b7804f

Add dummy pts and logfc for categorical features

af4a62a

Don't add empty pts when pts=False

e457546

Sort features by adjusted p-value

c3b5672

Don't set groupby to .obs

0830c20

Reformat code for updating .uns

c4e57de

Always reset key_added in .uns

1353a7f

Reformat evaluation of categorical features

5841457

Remove rewriting "params" key

d6b6802

Check if feature is groupby w "ehrapycat_" prefix

451d766

Document return for _evaluate_categorical_features

658284c

Add documentation for _save_rank_features_result

2c252e3

Update version

806e8c0

Add test for _adjust_pvalues()

da2e9fe

Add test for _sort_features

2ce3715

Fix test dataset call

01f9580

Move utils and datatypes to separate files

f3637e7

Fix typo: method -> corr_method

38a2a02

Check that every value of array is true

f514984

Compare p-values by group

5829038

Copy pvals to pvals_adj

80d8ec8

Copy pvals to pvals_adj instead of assignment

e1d670d

Convert dataframe key to list to prevent error

bb47cdd

Convert dataframe key to list to prevent error

3a27c7d

Add test_save_rank_features_result

24bdf1a

Always return tuple

08560d6

Add examples to _get_groups_order

2b73280

Fix typo: remove bracket

424db06

VladimirShitov added 12 commits June 13, 2023 16:00

Add test_get_groups_order

6adc856

Remove a helper copy-pasted code

61ffb45

Add test_evaluate_categorical_features

ace08ab

Add TestRankFeaturesGroups and test with real data

9274989

Add test for only continuous features

9dc030a

Make sure that added keys are always recarrays

8144210

Add test for only categorical features in adata

372a37b

Reformat DF conversion for better readability

01e1e38

Test that ehrapycat_day_icu_intime is in result

143e69f

Don't save index when converting DF to records

dcbb22d

Fix DF conversion to records

4e87ad3

Reformat code with nox

f27f2bd

Merge branch 'development' into feature/rank-features-group

2a43ba7

VladimirShitov linked an issue Jun 14, 2023 that may be closed by this pull request

Add more statistical tests for comparing features #532

Closed

Zethson requested changes Jun 14, 2023

View reviewed changes

VladimirShitov added 9 commits June 19, 2023 15:21

Rename method and categorical_method for clarity

bfcbf29

Rename corr_method to correction_method

ef080ec

Rename corr_method to correction_method

3f82250

Move comments higher for clarity

2a10d1a

Rename genes to features

9a874e6

Fix typos in a comment

5895045

Fix code style by black

da8671c

Move rank_features_groups code to separate folder

7b6f4c6

Fix code style with black

b5362d4

Refactoring

f036fb9

Signed-off-by: zethson <[email protected]>

Zethson merged commit 1574e6c into theislab:development Jun 20, 2023

VladimirShitov deleted the feature/rank-features-group branch June 20, 2023 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/rank features group #546

Feature/rank features group #546

VladimirShitov commented Jun 14, 2023 •

edited

Loading

VladimirShitov commented Jun 14, 2023

Zethson left a comment

VladimirShitov commented Jun 19, 2023

Feature/rank features group #546

Feature/rank features group #546

Conversation

VladimirShitov commented Jun 14, 2023 • edited Loading

VladimirShitov commented Jun 14, 2023

Zethson left a comment

Choose a reason for hiding this comment

VladimirShitov commented Jun 19, 2023

VladimirShitov commented Jun 14, 2023 •

edited

Loading