feat(schema 5.1.0): validate uns[spatial] #858

MillenniumFalconMechanic · 2024-04-24T23:00:04Z

Reason for Change

cellxgene-schema CLI must add validation for uns['spatial'] #827

Changes

Added _check_spatial() to validate uns["spatial"] values.
Added Seurat conversion warning for Visium datasets.

Testing

Added unit tests.
Tested with 5.1.0 datasets.

nayib-jose-gloria · 2024-04-25T13:56:49Z

cellxgene_schema_cli/cellxgene_schema/validate.py

@@ -923,6 +926,13 @@ def _validate_seurat_convertibility(self):
            )
            self.is_seurat_convertible = False

+        # Seurat conversion is not supported for Visium datasets.


move this block as the first check in this function, and return if self._is_visium. That will allow us to skip checking the X matrix, which is a more expensive operation, if we don't need to because seurat convertability is already False due to the visium check.

nayib-jose-gloria · 2024-04-25T14:08:52Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+            return
+
+        # spatial is forbidden if assay it not a supported spatial assay.
+        uns_spatial_specified = "spatial" in self.adata.uns


nit: you can replace uns_spatial_specified with uns_spatial = self.adata.uns.get("spatial") since it returns None if uns_spatial is not specified. one less var to maintain

nayib-jose-gloria · 2024-04-25T14:15:40Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+        uns_spatial_keys = list(uns_spatial.keys())
+        library_ids = list(filter(lambda x: x != "is_single", uns_spatial_keys))
+        if len(library_ids) > 1:
+            self.errors.append("uns['spatial'] must contain only one library_id.")


nit: thinking about rewording this error slightly, since it's possible the user thought they could include additional metadata, not intending for them to be library ids.

Maybe something like f"uns['spatial'] must contain only two top-level keys: 'is_single' and a library_id. More than 2 top-level keys detected: {library_ids}".

nayib-jose-gloria · 2024-04-25T14:32:31Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+        # library_id is required if assay is Visium and is_single is True.
+        if len(library_ids) == 0:
+            self.errors.append(
+                "uns['spatial'] must contain the key 'library_id' for obs['assay_ontology_term_id'] "


nit: since the key can be anything, we should reword this so as not to imply it must contain the literal string "library_id" as the key.

Maybe something like: ...must contain at least one key representing the library_id

nayib-jose-gloria · 2024-04-25T14:45:16Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+            )
+
+        # Confirm max dimension of image, if specified, is valid.
+        if max_dimension is not None and max(image.shape) > 2000:


accidental hard-code

Suggested change

if max_dimension is not None and max(image.shape) > 2000:

if max_dimension is not None and max(image.shape) > max_dimension:

nayib-jose-gloria · 2024-04-25T14:46:18Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+        # Confirm max dimension of image, if specified, is valid.
+        if max_dimension is not None and max(image.shape) > 2000:
+            self.errors.append(
+                f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, "


Suggested change

f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, "

f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of {max_dimension} pixels, "

nayib-jose-gloria · 2024-04-25T14:50:33Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+            )
+
+        # Confirm max dimension of image, if specified, is valid.
+        if max_dimension is not None and max(image.shape) > 2000:


also, I would assume this is the right interpretation of the rule, but there's some vagueness with how the rule is written:

Its largest dimension MUST be 2000 pixels.

I will double check with Brian whether the intended validation is the largest dimension is <= 2000 or actually == 2000

nayib-jose-gloria · 2024-04-25T15:00:49Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+                )
+            # spot_diameter_fullres is specified: proceed with validation.
+            else:
+                self._validate_float(


nit: I think it'd be useful to include more details in the error message (i.e. "This must be the value of the spot_diameter_fullres field from scalefactors_json.json"), in which case we probably want to pull this out of a generalized _validate_float method

nayib-jose-gloria · 2024-04-25T15:01:15Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+                )
+            # tissue_hires_scalef is specified: proceed with validation.
+            else:
+                self._validate_float(


nit: I think it'd be useful to include more details in the error message (i.e. "This must be the value of the tissue_hires_scalef field from scalefactors_json.json."), in which case we probably want to pull this out of a generalized _validate_float method

nayib-jose-gloria · 2024-04-25T15:03:12Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+        assay_ontology_term_id = self.adata.obs.get("assay_ontology_term_id")
+        return assay_ontology_term_id is not None and (assay_ontology_term_id == ASSAY_VISIUM).any()
+
+    def _validate_float(self, name: str, value: float):


see comments above; I think we can get rid of this.

If you disagree and would like to keep it, I think it'd make more sense to generalize into a _validate_type method instead and add type as a parameter. And generalize the docstring, which doesn't need to be specific to scalefactors

nayib-jose-gloria · 2024-04-25T15:07:33Z

cellxgene_schema_cli/tests/fixtures/examples_validate.py

+    "default_embedding": "X_umap",
+    "X_approximate_distribution": "normal",
+    "batch_condition": ["is_primary_data"],
+    "spatial": {"is_single": numpy.bool_(True)},


let's make "is_single" a non-numpy boolean True here, since both should be valid. Keep the numpy bool in the visium uns so we can cover both cases

nayib-jose-gloria

looks good! a few suggested nits (optional but I think would be helpful), a couple requests, and 1 rule I'm going to clarify with Brian just in case

nayib-jose-gloria

LGTM, thank you! Just waiting on clarification from Brian re: max dimension requirement. Shouldn't block testing though, since presumably it'll come up there as well

Bento007

Reviewed the validaiton changes. Looking for tests now.

Bento007 · 2024-04-25T19:34:44Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+            return
+
+        # is_single must be a boolean.
+        uns_is_single = uns_spatial["is_single"]


I don't think this is strictly np.bools, regular bool should be supported too.

I think the regular bool case is covered already (see np.bool_ and regular bool test cases) but let me know if I am missing something here.

Bento007 · 2024-04-25T19:58:15Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+
+            # fullres is optional.
+            uns_fullres = uns_images.get("fullres")
+            if uns_fullres is not None:


Add a warning if no fullres is included, since it is STRONGLY RECOMMENDED.

brianraymor · 2024-04-25T20:53:05Z

I will double check with Brian whether the intended validation is the largest dimension is <= 2000 or actually == 2000

@nayib-jose-gloria - apologies if I was @mentioned somewhere and missed it.

The answer is "== 2000" although I agree that the space ranger documentation is a bit ambiguous. I did validate earlier against the highres visium images in the corpus. There was always a 2000 present in one dimension. See this report. The schema was also reviewed by 10X PM and CB.

nayib-jose-gloria · 2024-04-25T20:59:16Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+            )
+
+        # Confirm max dimension of image, if specified, is valid.
+        if max_dimension is not None and max(image.shape) > max_dimension:


#858 (comment)

looks like we need to update this check + the accompanying comments to reflect that we want the max dimension to == max_dimension exactly

nayib-jose-gloria

#858 (comment)

we'll need to update the max_dimension check accordingly, ready to go after that

MillenniumFalconMechanic · 2024-04-25T23:31:33Z

Hi @brian-mott, this is ready for review!

brian-mott · 2024-04-26T22:15:22Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+            return
+
+        # is_single is required.
+        if "is_single" not in uns_spatial:


If I set uns['spatial'] to an int, bool, or float, that raises a TypeError and stops validation: "TypeError: argument of type {'int' | 'bool' | 'float'} is not iterable."

Other collections like lists, numpy arrays, and pandas dataframes are handled, validation completes, and the errors with validation are displayed correctly.

I've covered most of the testing but I'm still working on some items. I wanted to comment as I've uncovered issues. Let me know if you prefer the more formal review or different way to do this.

brian-mott · 2024-04-26T23:24:35Z

cellxgene_schema_cli/cellxgene_schema/validate.py

+        else:
+            # Confirm shape of scalefactors is valid: allowed keys are spot_diameter_fullres and tissue_hires_scalef.
+            uns_scalefactors = uns_library_id["scalefactors"]
+            if not self._has_no_extra_keys(uns_scalefactors, ["spot_diameter_fullres", "tissue_hires_scalef"]):


If I set uns['spatial'][library_id][scalefactors] = anything other than a dict or pandas.Dataframe, I get an AttributeError: "AttributeError: '{type}' object has no attribute 'keys'" This behaves similarly to my above comment where the raised exception prematurely exits validation.

uns['spatial'][library_id][scalefactors] is an output from scanpy.read_visium() which would suggest typing should be pretty consistent and not the wide range I've tested against.

But we are finding through the curation towards 5.1.0 that images that are labeled as a given resolution aren't necessarily the correct resolution or set at the proper scale factor. So we do have to manually adjust these keys and values and we could easily wrangle to something other than a dict with correct keys and values.

MillenniumFalconMechanic changed the title ~~feat(schema 5.1.0): valdidate uns[spatial]~~ feat(schema 5.1.0): validate uns[spatial] Apr 24, 2024

MillenniumFalconMechanic force-pushed the mim/827-spatial branch from f29b9c6 to 9faeb15 Compare April 24, 2024 23:04

MillenniumFalconMechanic marked this pull request as ready for review April 25, 2024 05:26

MillenniumFalconMechanic requested review from Bento007, nayib-jose-gloria and danieljhegeman April 25, 2024 05:27

nayib-jose-gloria reviewed Apr 25, 2024

View reviewed changes

nayib-jose-gloria requested changes Apr 25, 2024

View reviewed changes

MillenniumFalconMechanic requested a review from nayib-jose-gloria April 25, 2024 17:44

nayib-jose-gloria reviewed Apr 25, 2024

View reviewed changes

nayib-jose-gloria self-requested a review April 25, 2024 18:33

Bento007 requested changes Apr 25, 2024

View reviewed changes

MillenniumFalconMechanic requested a review from Bento007 April 25, 2024 20:42

Bento007 approved these changes Apr 25, 2024

View reviewed changes

nayib-jose-gloria reviewed Apr 25, 2024

View reviewed changes

nayib-jose-gloria requested changes Apr 25, 2024

View reviewed changes

MillenniumFalconMechanic requested a review from nayib-jose-gloria April 25, 2024 21:54

Bento007 approved these changes Apr 25, 2024

View reviewed changes

feat(schema 5.1.0): validate uns[spatial]

100c80e

MillenniumFalconMechanic added 11 commits April 25, 2024 15:09

Linting

b9eb627

Minor polish

7977a18

Linting

509ad9e

Updated error messages

76c80db

Review updates

799a940

Review updates

2cdb3fa

Linting

25101a2

Review updates

e81991e

Reverted update

5a821b9

Updated max_dimension to ==

1a34318

Updated no_X_embedding tests for spatial

88096c0

MillenniumFalconMechanic force-pushed the mim/827-spatial branch from 86f9eb1 to 88096c0 Compare April 25, 2024 23:29

MillenniumFalconMechanic requested a review from brian-mott April 25, 2024 23:31

MillenniumFalconMechanic force-pushed the mim/827-spatial branch from a515a5b to 88096c0 Compare April 26, 2024 20:45

brian-mott reviewed Apr 26, 2024

View reviewed changes

brianraymor mentioned this pull request Apr 27, 2024

cellxgene-schema CLI must add validation for uns['spatial'] #827

Closed

Bento007 mentioned this pull request Apr 29, 2024

fix: type checking for spatial #862

Closed

2 tasks

Merge branch 'main' into mim/827-spatial

3ae529d

Bento007 merged commit 4a9ec93 into main Apr 29, 2024
6 checks passed

Bento007 deleted the mim/827-spatial branch April 29, 2024 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(schema 5.1.0): validate uns[spatial] #858

feat(schema 5.1.0): validate uns[spatial] #858

MillenniumFalconMechanic commented Apr 24, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024 •

edited

Loading

nayib-jose-gloria Apr 25, 2024 •

edited

Loading

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria left a comment

nayib-jose-gloria left a comment

Bento007 left a comment

Bento007 Apr 25, 2024

MillenniumFalconMechanic Apr 25, 2024

Bento007 Apr 25, 2024

brianraymor commented Apr 25, 2024

nayib-jose-gloria Apr 25, 2024

nayib-jose-gloria left a comment

MillenniumFalconMechanic commented Apr 25, 2024

brian-mott Apr 26, 2024

brian-mott Apr 26, 2024

	if max_dimension is not None and max(image.shape) > 2000:
	if max_dimension is not None and max(image.shape) > max_dimension:

	f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, "
	f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of {max_dimension} pixels, "

feat(schema 5.1.0): validate uns[spatial] #858

feat(schema 5.1.0): validate uns[spatial] #858

Conversation

MillenniumFalconMechanic commented Apr 24, 2024

Reason for Change

Changes

Testing

Choose a reason for hiding this comment

nayib-jose-gloria Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

nayib-jose-gloria Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nayib-jose-gloria left a comment

Choose a reason for hiding this comment

nayib-jose-gloria left a comment

Choose a reason for hiding this comment

Bento007 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianraymor commented Apr 25, 2024

Choose a reason for hiding this comment

nayib-jose-gloria left a comment

Choose a reason for hiding this comment

MillenniumFalconMechanic commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nayib-jose-gloria Apr 25, 2024 •

edited

Loading

nayib-jose-gloria Apr 25, 2024 •

edited

Loading