Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Write .h5ad files #347

Merged
merged 9 commits into from
Mar 11, 2022
Merged

[FIX] Write .h5ad files #347

merged 9 commits into from
Mar 11, 2022

Conversation

Imipenem
Copy link
Collaborator

@Imipenem Imipenem commented Mar 10, 2022

PR Checklist

  • This comment contains a description of changes (with reason)
  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

Fix #337 and #333
Issue: Cannot write dtype object to .h5ad files.

Fix:

New feature:

Current Issues:

  • writing nullable booleans is currently not supported by AnnData 0.7.8
    --> but this got added in 0.8.0 which should be released in the next few days

- convert any "object" dtype values in obs or uns to "category" or "str" for writing to .h5ad files

- current issues: 1.) Categories True/False cannot be written
                  2.) Non-numerical values in X have to be encoded before writing
@Imipenem Imipenem linked an issue Mar 10, 2022 that may be closed by this pull request
@github-actions github-actions bot added the bug Something isn't working label Mar 10, 2022
- fixed an error that caused numerical detection to produce false positives while reading
- column with boolean values (as well as binary columns) are now casted to bool columns in obs

- columns with 0 and 1 will be casted as well
- fixed an error that caused dtype of X to be object despite all columns being numeric and the inital AnnData object had numerical X dtype
- writing unencoded AnnData objects to .h5ad files no longer raises an error and can be performed normally
- added tests + new test files

- updated docs

- pinned flake8-bandit to >=3.0.0 since this fixes the previous CI error
@Imipenem Imipenem marked this pull request as ready for review March 11, 2022 16:54
@Imipenem Imipenem requested a review from Zethson March 11, 2022 16:54
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good I hope :)

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@Zethson
Copy link
Member

Zethson commented Mar 11, 2022

Please merge development into this and fix the dependencies, please.

adata_cp = adata.copy()
adata_cp.uns["ehrapy_dummy_encoding"] = True
adata_cp.uns["columns_obs_only"] = list(adata_cp.obs.columns)
# TODO: THIS SHOULD BE FIXED WITH PR #348, SO NO COPY SHOULD BE NEEDED THEN SINCE THE ORIGINAL WILL NOT BE MUTATED
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #348

@Imipenem Imipenem merged commit 27891ed into development Mar 11, 2022
@Zethson Zethson deleted the fix/write_h5ad branch January 12, 2023 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to write Diabetes 130 dataset Writing unencoded dataset results in error
2 participants