Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(schema 5.1.0): validate uns[spatial] #858

Merged
merged 13 commits into from
Apr 29, 2024
Merged

feat(schema 5.1.0): validate uns[spatial] #858

merged 13 commits into from
Apr 29, 2024

Conversation

MillenniumFalconMechanic
Copy link
Collaborator

Reason for Change

Changes

  • Added _check_spatial() to validate uns["spatial"] values.
  • Added Seurat conversion warning for Visium datasets.

Testing

  • Added unit tests.
  • Tested with 5.1.0 datasets.

@MillenniumFalconMechanic MillenniumFalconMechanic changed the title feat(schema 5.1.0): valdidate uns[spatial] feat(schema 5.1.0): validate uns[spatial] Apr 24, 2024
@MillenniumFalconMechanic MillenniumFalconMechanic marked this pull request as ready for review April 25, 2024 05:26
@@ -923,6 +926,13 @@ def _validate_seurat_convertibility(self):
)
self.is_seurat_convertible = False

# Seurat conversion is not supported for Visium datasets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this block as the first check in this function, and return if self._is_visium. That will allow us to skip checking the X matrix, which is a more expensive operation, if we don't need to because seurat convertability is already False due to the visium check.

return

# spatial is forbidden if assay it not a supported spatial assay.
uns_spatial_specified = "spatial" in self.adata.uns
Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can replace uns_spatial_specified with uns_spatial = self.adata.uns.get("spatial") since it returns None if uns_spatial is not specified. one less var to maintain

uns_spatial_keys = list(uns_spatial.keys())
library_ids = list(filter(lambda x: x != "is_single", uns_spatial_keys))
if len(library_ids) > 1:
self.errors.append("uns['spatial'] must contain only one library_id.")
Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: thinking about rewording this error slightly, since it's possible the user thought they could include additional metadata, not intending for them to be library ids.

Maybe something like f"uns['spatial'] must contain only two top-level keys: 'is_single' and a library_id. More than 2 top-level keys detected: {library_ids}".

# library_id is required if assay is Visium and is_single is True.
if len(library_ids) == 0:
self.errors.append(
"uns['spatial'] must contain the key 'library_id' for obs['assay_ontology_term_id'] "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since the key can be anything, we should reword this so as not to imply it must contain the literal string "library_id" as the key.

Maybe something like: ...must contain at least one key representing the library_id

)

# Confirm max dimension of image, if specified, is valid.
if max_dimension is not None and max(image.shape) > 2000:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidental hard-code

Suggested change
if max_dimension is not None and max(image.shape) > 2000:
if max_dimension is not None and max(image.shape) > max_dimension:

# Confirm max dimension of image, if specified, is valid.
if max_dimension is not None and max(image.shape) > 2000:
self.errors.append(
f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, "
f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of {max_dimension} pixels, "

)

# Confirm max dimension of image, if specified, is valid.
if max_dimension is not None and max(image.shape) > 2000:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I would assume this is the right interpretation of the rule, but there's some vagueness with how the rule is written:

Its largest dimension MUST be 2000 pixels.

I will double check with Brian whether the intended validation is the largest dimension is <= 2000 or actually == 2000

)
# spot_diameter_fullres is specified: proceed with validation.
else:
self._validate_float(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it'd be useful to include more details in the error message (i.e. "This must be the value of the spot_diameter_fullres field from scalefactors_json.json"), in which case we probably want to pull this out of a generalized _validate_float method

)
# tissue_hires_scalef is specified: proceed with validation.
else:
self._validate_float(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it'd be useful to include more details in the error message (i.e. "This must be the value of the tissue_hires_scalef field from scalefactors_json.json."), in which case we probably want to pull this out of a generalized _validate_float method

assay_ontology_term_id = self.adata.obs.get("assay_ontology_term_id")
return assay_ontology_term_id is not None and (assay_ontology_term_id == ASSAY_VISIUM).any()

def _validate_float(self, name: str, value: float):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments above; I think we can get rid of this.

If you disagree and would like to keep it, I think it'd make more sense to generalize into a _validate_type method instead and add type as a parameter. And generalize the docstring, which doesn't need to be specific to scalefactors

"default_embedding": "X_umap",
"X_approximate_distribution": "normal",
"batch_condition": ["is_primary_data"],
"spatial": {"is_single": numpy.bool_(True)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make "is_single" a non-numpy boolean True here, since both should be valid. Keep the numpy bool in the visium uns so we can cover both cases

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! a few suggested nits (optional but I think would be helpful), a couple requests, and 1 rule I'm going to clarify with Brian just in case

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! Just waiting on clarification from Brian re: max dimension requirement. Shouldn't block testing though, since presumably it'll come up there as well

@nayib-jose-gloria nayib-jose-gloria self-requested a review April 25, 2024 18:33
Copy link
Contributor

@Bento007 Bento007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the validaiton changes. Looking for tests now.

return

# is_single must be a boolean.
uns_is_single = uns_spatial["is_single"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is strictly np.bools, regular bool should be supported too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the regular bool case is covered already (see np.bool_ and regular bool test cases) but let me know if I am missing something here.


# fullres is optional.
uns_fullres = uns_images.get("fullres")
if uns_fullres is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a warning if no fullres is included, since it is STRONGLY RECOMMENDED.

@brianraymor
Copy link
Contributor

I will double check with Brian whether the intended validation is the largest dimension is <= 2000 or actually == 2000

@nayib-jose-gloria - apologies if I was @mentioned somewhere and missed it.

The answer is "== 2000" although I agree that the space ranger documentation is a bit ambiguous. I did validate earlier against the highres visium images in the corpus. There was always a 2000 present in one dimension. See this report. The schema was also reviewed by 10X PM and CB.

)

# Confirm max dimension of image, if specified, is valid.
if max_dimension is not None and max(image.shape) > max_dimension:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#858 (comment)

looks like we need to update this check + the accompanying comments to reflect that we want the max dimension to == max_dimension exactly

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#858 (comment)

we'll need to update the max_dimension check accordingly, ready to go after that

@MillenniumFalconMechanic
Copy link
Collaborator Author

Hi @brian-mott, this is ready for review!

return

# is_single is required.
if "is_single" not in uns_spatial:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set uns['spatial'] to an int, bool, or float, that raises a TypeError and stops validation: "TypeError: argument of type {'int' | 'bool' | 'float'} is not iterable."

Other collections like lists, numpy arrays, and pandas dataframes are handled, validation completes, and the errors with validation are displayed correctly.

I've covered most of the testing but I'm still working on some items. I wanted to comment as I've uncovered issues. Let me know if you prefer the more formal review or different way to do this.

else:
# Confirm shape of scalefactors is valid: allowed keys are spot_diameter_fullres and tissue_hires_scalef.
uns_scalefactors = uns_library_id["scalefactors"]
if not self._has_no_extra_keys(uns_scalefactors, ["spot_diameter_fullres", "tissue_hires_scalef"]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set uns['spatial'][library_id][scalefactors] = anything other than a dict or pandas.Dataframe, I get an AttributeError: "AttributeError: '{type}' object has no attribute 'keys'" This behaves similarly to my above comment where the raised exception prematurely exits validation.

uns['spatial'][library_id][scalefactors] is an output from scanpy.read_visium() which would suggest typing should be pretty consistent and not the wide range I've tested against.

But we are finding through the curation towards 5.1.0 that images that are labeled as a given resolution aren't necessarily the correct resolution or set at the proper scale factor. So we do have to manually adjust these keys and values and we could easily wrangle to something other than a dict with correct keys and values.

@Bento007 Bento007 merged commit 4a9ec93 into main Apr 29, 2024
6 checks passed
@Bento007 Bento007 deleted the mim/827-spatial branch April 29, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants