You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azul only includes orphans in a verbatim manifest when the sole filter is datasets.dataset_id. Currently, the Data Browser filters by datasets.title. The dataset title is an unreliable filter because it is not guaranteed to be unique.
The Data Browser also unnecessarily specifies filters for donor and organism types even if the user selects all possible types, in which case the filters are redundant. Filtering by every possible value of a facet is equivalent to not filtering by that facet at all.
These two issues defeat Azul's detection of the fact that a manifest for an entire dataset is being requested, and causes it to exclude orphans from that manifest.
For example, the manifest request currently made by the Data Browser is
Thanks, will try to get this out this week. @hannes-ucsc@bvizzier-ucsc. This is assigned and in progress.
MillenniumFalconMechanic
changed the title
Add suport for inclusion of orphans in verbatim PFB
Add support for inclusion of orphans in verbatim PFB
Nov 12, 2024
Azul only includes orphans in a verbatim manifest when the sole filter is
datasets.dataset_id
. Currently, the Data Browser filters bydatasets.title
. The dataset title is an unreliable filter because it is not guaranteed to be unique.The Data Browser also unnecessarily specifies filters for donor and organism types even if the user selects all possible types, in which case the filters are redundant. Filtering by every possible value of a facet is equivalent to not filtering by that facet at all.
These two issues defeat Azul's detection of the fact that a manifest for an entire dataset is being requested, and causes it to exclude orphans from that manifest.
For example, the manifest request currently made by the Data Browser is
https://service.anvil.gi.ucsc.edu/fetch/manifest/files?catalog=anvil&filters={"datasets.title":{"is":["ANVIL_1000G_2019_Dev"]},"donors.organism_type":{"is":[null]},"files.file_format":{"is":[".md5",".tbi",".vcf.gz",".crai",".cram",".txt"]}}&format=verbatim.pfb
In order to include orphans, that request must be just
https://service.anvil.gi.ucsc.edu/fetch/manifest/files?catalog=anvil&filters={"datasets.dataset_id":{"is":["677dd55c-3fa3-4b07-8c98-985d94d7577e"]}}&format=verbatim.pfb
The text was updated successfully, but these errors were encountered: