Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update X matrix validation to validate Visium assays #876

Merged
merged 6 commits into from
May 6, 2024

Conversation

nayib-jose-gloria
Copy link
Contributor

Reason for Change

Changes

  • refactor _has_valid_raw to split raw data validation into two branches: validating visium assay datasets with is_single True and in_tissue 0 values (introduces totally new validation rule that requires row-by-row iteration: rows in the matrix corresponding to obs['in_tissue'] == 0 rows CAN be all zeros--as long as there is at least ONE non-zero value among the group of in_tissue == 0 rows)
  • I chose not to introduce matrix chunking to the visium + is_single + in_tissue 0 data validation branch because we previously validate that its length is 4992 exactly, which is less than our default chunking size. Memory shouldn't be a concern in this case
  • moved typical data validation pathway for all other cases to _validate_raw_data method
    - this includes datasets where visium + is_single + in_tissue is 1 for all cases because that pathway has the same value ruleset as the other assays we validate for. The only diff is validating 4992 rows for visium + is_single datasets, so that check is done outside of this helper function.

is_sparse_matrix = matrix_format in SPARSE_MATRIX_TYPES
for matrix_chunk, _, _ in self._chunk_matrix(x):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to _validate_raw_data

has_row_of_zeros = True

if not has_invalid_nonzero_value:
data = matrix_chunk if isinstance(matrix_chunk, np.ndarray) else matrix_chunk.data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to _matrix_has_invalid_nonzero_values for re-use

@@ -38,6 +41,7 @@ def __init__(self, ignore_labels=False):
self.schema_def = dict()
self.schema_version: str = None
self.ignore_labels = ignore_labels
self.visium_and_is_single_true_matrix_size = VISIUM_AND_IS_SINGLE_TRUE_MATRIX_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in the future you can patch VISIUM_AND_IS_SINGLE_TRUE_MATRIX_SIZE in the tests to avoid having this class variable. Just one less duplicated variable.

@Bento007 Bento007 requested a review from brian-mott May 3, 2024 21:12
Copy link
Collaborator

@brian-mott brian-mott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good everyone, works as defined with spatial and non-spatial datasets and +/- a normalized matrix. Thanks!

@nayib-jose-gloria nayib-jose-gloria enabled auto-merge (squash) May 6, 2024 21:30
@nayib-jose-gloria nayib-jose-gloria merged commit 6fb8d8c into main May 6, 2024
5 of 6 checks passed
@nayib-jose-gloria nayib-jose-gloria deleted the nayib/828-visium-X branch May 6, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants