Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AnVIL DX] Update data modality source #4258

Open
MillenniumFalconMechanic opened this issue Nov 12, 2024 · 4 comments
Open

[AnVIL DX] Update data modality source #4258

MillenniumFalconMechanic opened this issue Nov 12, 2024 · 4 comments
Assignees
Labels
canary Done by the Clever Canary team

Comments

@MillenniumFalconMechanic
Copy link
Contributor

MillenniumFalconMechanic commented Nov 12, 2024

Need

The data modality source is to be updated for the following:

  • Datasets: use datasets.data_modality files.data_modality.
  • Files: use files.data_modality.

No change required for Activities.

@github-actions github-actions bot added the canary Done by the Clever Canary team label Nov 12, 2024
@hunterckx
Copy link
Contributor

hunterckx commented Nov 13, 2024

The API responses pose two issues for implementing filtering and sorting of data modality:

  • datasets.data_modality is not present in termFacets, which prevents dataset data modality from being used as a key for filtering/sorting.
  • Beyond that, there's not an appropriate choice of field to reference if we want the data modality filter to be consistent across entity types; there's no data modality field that is both well-populated and available from the APIs for all five entity types -- e.g. datasets.data_modality is only available via the datasets API, while activities.data_modality is available via all APIs but lacks values that are present in datasets.data_modality. Ideally, we would use activities.data_modality, so as to avoid, for example, marking a file as having all data modalities present in the associated dataset.

@NoopDog NoopDog assigned NoopDog and unassigned hunterckx Nov 20, 2024
@MillenniumFalconMechanic
Copy link
Contributor Author

MillenniumFalconMechanic commented Feb 11, 2025

Hi @hunterckx, thank you for your comments. Could we go ahead and use files.data_modality for all entity types? I have updated the ticket description above.

Update
@hunterckx, no action required at this point.

According to the FSS, there are six slots (anvil_alignmentactivity, anvil_assayactivity, anvil_dataset, anvil_file, anvil_sequencingactivity, anvil_variantcallingactivity) where data_modality is specified. The definition is always:

Data modality describes the biological nature of the information gathered as the result of an Activity, independent of the technology or methods used to produce the information.

cc @NoopDog.

@NoopDog
Copy link
Collaborator

NoopDog commented Feb 11, 2025

It might be helpful to have different desriptions for these concepts if they genuinely have their own meanings on different slots.

We should also decide which fields should be shown in the UI to support filtering.

One proposal is only to index the files.data_modality slot and roll this up to the various parent entities in Azul. Then we would allow filtering by files.data_modality on all entities.

One problem, tho, is that files.data_modality is currently always null in the Azul response.

@MillenniumFalconMechanic
Copy link
Contributor Author

Closing; requires further analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
canary Done by the Clever Canary team
Projects
None yet
Development

No branches or pull requests

4 participants