You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently had a general discussion (meeting notes, 2024-11-21/22) about what would be required for workflows to be runnable by external collaborators entirely from custom configs, i.e. no modification of anything in the pathogen repo. Here's a summary of what we discussed, specifically as it pertains to this guide. I don't think we're actually very far away from achieving this, and the aim of this summary is to help guide us to this goal and add things that we consider
Running workflows from a separate analysis directory There are prototypes of this here and here, and the corresponding CLI prototype is here. Once these prototypes have solidified I expect the necessary snakemake modifications will be added to this guide. While they still require the workflow repo directory to be managed by the user (git cloned, updated etc), I think that's enough for the purposes of this goal; if we can get the CLI to manage them then even better.
Generalized subsampling This is currently possible with general Snakemake rules. The only small hiccup is the need to nullify unused default subsampling names because of the config dictionary merge.
Allow customisation of DTA columns etc This guide doesn't specify the config style & snakemake rules for these parts of the workflow so in practice they'll most likely be copied from existing workflows. I think these are config-customisable by setting your own config["traits"]["columns"] (or similar). The desired customisations brought up in the recent meeting were all conceptually similar to this, they didn't involve toggling rules on/off or other more complex changes to the workflow. I'd consider this task "done" except for the ability to nullify values where dictionaries are used in the config (see subsampling section above).
Workflow versioning. When running analyses separate to the workflow it's crucial to make it clear when user configs are out-of-date and provide a path to updating them. Conversely it's desired to know what effect a particular config value has, although that seems a harder problem to me and perhaps docs + config validation would achieve this. We can start by using one-off checks within code however linters and config schemas would be more powerful¹. We've talked forever about generating docs from schemas and perhaps that's a direction we could take for pathogen repos from day 1.
¹ My experience with schemas in augur is that they are good at identifying invalid data but poor at explaining what's wrong and therefore hint at how to fix it. Presumably the schemas here will be simpler so the error messages may be more informative.
The text was updated successfully, but these errors were encountered:
We recently had a general discussion (meeting notes, 2024-11-21/22) about what would be required for workflows to be runnable by external collaborators entirely from custom configs, i.e. no modification of anything in the pathogen repo. Here's a summary of what we discussed, specifically as it pertains to this guide. I don't think we're actually very far away from achieving this, and the aim of this summary is to help guide us to this goal and add things that we consider
Running workflows from a separate analysis directory There are prototypes of this here and here, and the corresponding CLI prototype is here. Once these prototypes have solidified I expect the necessary snakemake modifications will be added to this guide. While they still require the workflow repo directory to be managed by the user (git cloned, updated etc), I think that's enough for the purposes of this goal; if we can get the CLI to manage them then even better.
Adding private metadata & sequences to workflows (Provide a generic pattern for including additional user data alongside curated data #72) This is blocked on merge: Support sequences augur#1579 but there's no reason we can't trial this out for metadata-only additions with
augur merge
right now. If we consider private metadata curation beyond the scope of this (I do) then we already have working script-based approaches (ncov, mpox) to follow.Generalized subsampling This is currently possible with general Snakemake rules. The only small hiccup is the need to nullify unused default subsampling names because of the config dictionary merge.
Allow customisation of DTA columns etc This guide doesn't specify the config style & snakemake rules for these parts of the workflow so in practice they'll most likely be copied from existing workflows. I think these are config-customisable by setting your own
config["traits"]["columns"]
(or similar). The desired customisations brought up in the recent meeting were all conceptually similar to this, they didn't involve toggling rules on/off or other more complex changes to the workflow. I'd consider this task "done" except for the ability to nullify values where dictionaries are used in the config (see subsampling section above).Workflow versioning. When running analyses separate to the workflow it's crucial to make it clear when user configs are out-of-date and provide a path to updating them. Conversely it's desired to know what effect a particular config value has, although that seems a harder problem to me and perhaps docs + config validation would achieve this. We can start by using one-off checks within code however linters and config schemas would be more powerful¹. We've talked forever about generating docs from schemas and perhaps that's a direction we could take for pathogen repos from day 1.
¹ My experience with schemas in augur is that they are good at identifying invalid data but poor at explaining what's wrong and therefore hint at how to fix it. Presumably the schemas here will be simpler so the error messages may be more informative.
The text was updated successfully, but these errors were encountered: