-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(schema 5.1.0): validate uns[spatial] #858
Conversation
f29b9c6
to
9faeb15
Compare
@@ -923,6 +926,13 @@ def _validate_seurat_convertibility(self): | |||
) | |||
self.is_seurat_convertible = False | |||
|
|||
# Seurat conversion is not supported for Visium datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this block as the first check in this function, and return if self._is_visium
. That will allow us to skip checking the X matrix, which is a more expensive operation, if we don't need to because seurat convertability is already False due to the visium check.
return | ||
|
||
# spatial is forbidden if assay it not a supported spatial assay. | ||
uns_spatial_specified = "spatial" in self.adata.uns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can replace uns_spatial_specified with uns_spatial = self.adata.uns.get("spatial")
since it returns None if uns_spatial is not specified. one less var to maintain
uns_spatial_keys = list(uns_spatial.keys()) | ||
library_ids = list(filter(lambda x: x != "is_single", uns_spatial_keys)) | ||
if len(library_ids) > 1: | ||
self.errors.append("uns['spatial'] must contain only one library_id.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: thinking about rewording this error slightly, since it's possible the user thought they could include additional metadata, not intending for them to be library ids.
Maybe something like f"uns['spatial'] must contain only two top-level keys: 'is_single' and a library_id. More than 2 top-level keys detected: {library_ids}".
# library_id is required if assay is Visium and is_single is True. | ||
if len(library_ids) == 0: | ||
self.errors.append( | ||
"uns['spatial'] must contain the key 'library_id' for obs['assay_ontology_term_id'] " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since the key can be anything, we should reword this so as not to imply it must contain the literal string "library_id" as the key.
Maybe something like: ...must contain at least one key representing the library_id
) | ||
|
||
# Confirm max dimension of image, if specified, is valid. | ||
if max_dimension is not None and max(image.shape) > 2000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidental hard-code
if max_dimension is not None and max(image.shape) > 2000: | |
if max_dimension is not None and max(image.shape) > max_dimension: |
# Confirm max dimension of image, if specified, is valid. | ||
if max_dimension is not None and max(image.shape) > 2000: | ||
self.errors.append( | ||
f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of 2000 pixels, " | |
f"uns['spatial'][library_id]['images']['{image_name}'] has a max dimension of {max_dimension} pixels, " |
) | ||
|
||
# Confirm max dimension of image, if specified, is valid. | ||
if max_dimension is not None and max(image.shape) > 2000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, I would assume this is the right interpretation of the rule, but there's some vagueness with how the rule is written:
Its largest dimension MUST be 2000 pixels.
I will double check with Brian whether the intended validation is the largest dimension is <= 2000 or actually == 2000
) | ||
# spot_diameter_fullres is specified: proceed with validation. | ||
else: | ||
self._validate_float( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it'd be useful to include more details in the error message (i.e. "This must be the value of the spot_diameter_fullres field from scalefactors_json.json"), in which case we probably want to pull this out of a generalized _validate_float
method
) | ||
# tissue_hires_scalef is specified: proceed with validation. | ||
else: | ||
self._validate_float( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it'd be useful to include more details in the error message (i.e. "This must be the value of the tissue_hires_scalef field from scalefactors_json.json."), in which case we probably want to pull this out of a generalized _validate_float
method
assay_ontology_term_id = self.adata.obs.get("assay_ontology_term_id") | ||
return assay_ontology_term_id is not None and (assay_ontology_term_id == ASSAY_VISIUM).any() | ||
|
||
def _validate_float(self, name: str, value: float): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments above; I think we can get rid of this.
If you disagree and would like to keep it, I think it'd make more sense to generalize into a _validate_type
method instead and add type
as a parameter. And generalize the docstring, which doesn't need to be specific to scalefactors
"default_embedding": "X_umap", | ||
"X_approximate_distribution": "normal", | ||
"batch_condition": ["is_primary_data"], | ||
"spatial": {"is_single": numpy.bool_(True)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make "is_single" a non-numpy boolean True here, since both should be valid. Keep the numpy bool in the visium uns so we can cover both cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! a few suggested nits (optional but I think would be helpful), a couple requests, and 1 rule I'm going to clarify with Brian just in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you! Just waiting on clarification from Brian re: max dimension requirement. Shouldn't block testing though, since presumably it'll come up there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the validaiton changes. Looking for tests now.
return | ||
|
||
# is_single must be a boolean. | ||
uns_is_single = uns_spatial["is_single"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is strictly np.bools, regular bool should be supported too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the regular bool case is covered already (see np.bool_ and regular bool test cases) but let me know if I am missing something here.
|
||
# fullres is optional. | ||
uns_fullres = uns_images.get("fullres") | ||
if uns_fullres is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a warning if no fullres is included, since it is STRONGLY RECOMMENDED.
@nayib-jose-gloria - apologies if I was @mentioned somewhere and missed it. The answer is "== 2000" although I agree that the space ranger documentation is a bit ambiguous. I did validate earlier against the highres visium images in the corpus. There was always a 2000 present in one dimension. See this report. The schema was also reviewed by 10X PM and CB. |
) | ||
|
||
# Confirm max dimension of image, if specified, is valid. | ||
if max_dimension is not None and max(image.shape) > max_dimension: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we need to update this check + the accompanying comments to reflect that we want the max dimension to == max_dimension exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll need to update the max_dimension check accordingly, ready to go after that
86f9eb1
to
88096c0
Compare
Hi @brian-mott, this is ready for review! |
a515a5b
to
88096c0
Compare
return | ||
|
||
# is_single is required. | ||
if "is_single" not in uns_spatial: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I set uns['spatial']
to an int, bool, or float, that raises a TypeError and stops validation: "TypeError: argument of type {'int' | 'bool' | 'float'} is not iterable."
Other collections like lists, numpy arrays, and pandas dataframes are handled, validation completes, and the errors with validation are displayed correctly.
I've covered most of the testing but I'm still working on some items. I wanted to comment as I've uncovered issues. Let me know if you prefer the more formal review or different way to do this.
else: | ||
# Confirm shape of scalefactors is valid: allowed keys are spot_diameter_fullres and tissue_hires_scalef. | ||
uns_scalefactors = uns_library_id["scalefactors"] | ||
if not self._has_no_extra_keys(uns_scalefactors, ["spot_diameter_fullres", "tissue_hires_scalef"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I set uns['spatial'][library_id][scalefactors]
= anything other than a dict
or pandas.Dataframe
, I get an AttributeError: "AttributeError: '{type}' object has no attribute 'keys'" This behaves similarly to my above comment where the raised exception prematurely exits validation.
uns['spatial'][library_id][scalefactors]
is an output from scanpy.read_visium()
which would suggest typing should be pretty consistent and not the wide range I've tested against.
But we are finding through the curation towards 5.1.0 that images that are labeled as a given resolution aren't necessarily the correct resolution or set at the proper scale factor. So we do have to manually adjust these keys and values and we could easily wrangle to something other than a dict with correct keys and values.
Reason for Change
Changes
_check_spatial()
to validateuns["spatial"]
values.Testing