Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update "build directory" terminology in help and docs #371

Open
tsibley opened this issue May 29, 2024 · 1 comment
Open

Update "build directory" terminology in help and docs #371

tsibley opened this issue May 29, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@tsibley
Copy link
Member

tsibley commented May 29, 2024

This terminology was left as-is in #355 (nextstrain-pathogen.yaml) even as it got a little more nuanced: it's no longer "the directory which was passed to nextstrain build." The term "build" has also drifted out of phase from terminology in the wider ecosystem.

Review and revise existing usages to clarify them and extend them to include the behaviour around nextstrain-pathogen.yaml.

There's a pretty good description of the behaviour in the changelog for 8.2.0:

cli/CHANGES.md

Lines 52 to 107 in 14a392e

* `nextstrain build` and `nextstrain shell` now better support pathogen
repositories which place workflows in subdirectories. The top-level of the
repo must contain a `nextstrain-pathogen.yaml` file for this support to
activate. The file may be empty for now, though we anticipate using it for
pathogen-level metadata in the future to aid indexing, listing, and
attribution of pathogen repos.
As an example of the new support, consider the following repo layout
mpox/
├── nextstrain-pathogen.yaml
├── ingest/
│ ├── Snakefile
│ └── …
├── phylogenetic/
│ ├── Snakefile
│ └── …
├── shared/
│ ├── reference.fasta
│ └── …
└── …
where `ingest/` and `phylogenetic/` contain workflows that use
`shared/reference.fasta` via a relative path (i.e.
`../shared/reference.fasta`).
It's now possible to invoke those workflows with any of the following:
nextstrain build mpox/ingest/
nextstrain build mpox/phylogenetic/
cd mpox
nextstrain build ingest/
nextstrain build phylogenetic/
cd phylogenetic
nextstrain build .
nextstrain build ../ingest/
regardless of runtime.
Previously, such workflows required careful invocation, e.g.
nextstrain build mpox/ -d phylogenetic/ -s phylogenetic/Snakefile
when using runtimes with filesystem isolation (i.e. the [containerized][]
ones; Docker, Singularity, and AWS Batch) but not when using runtimes without
it.
When active, this feature makes the top-level of the pathogen repo (e.g.
`mpox/`) available in the container at `/nextstrain/build` while the
initial working directory is set to the workflow subdirectory in the
container (e.g. `/nextstrain/build/phylogenetic`). That is, the filesystem
isolation boundary is drawn at the top-level of the pathogen repo instead of
at the workflow directory (i.e. what's given to `nextstrain build`).
([#355](https://github.com/nextstrain/cli/pull/355))

Motivated by @jameshadfield's commentary in review of a separate change.

@tsibley tsibley added the documentation Improvements or additions to documentation label May 29, 2024
@jameshadfield
Copy link
Member

Here's some notes I made myself which may help others while the docs are forthcoming.

Example: we're in avian-flu running nextstrain --docker ingest.

Without nextstrain-pathogen.yaml With nextstrain-pathogen.yaml
/nextstrain/build is ingest avian-flu
Starting directory /nextstrain/build /nextstrain/build/ingest

If we're in avian-flu/ingest and we run nextstrain --docker . the results are identical to above.

If we're in avian-flu and we run nextstrain --docker . then the presence of nextstrain-pathogen.yaml makes no difference.

When does this matter?

If you are using relative paths which track back to a higher directory than the one your workflow is in. For instance, avian-flu/ingest/Snakemake refers to the fauna location as ../fauna (src) and the repo doesn't define nextstrain-pathogen.yaml . Running this outside of docker doesn't work, as fauna isn't typically located at avian-flu/fauna . Using nextstrain --docker ingest works as we have nextstrain/fauna and nextstrain/build is the ingest dir.

jameshadfield added a commit to nextstrain/avian-flu that referenced this issue Aug 20, 2024
This results in a different mount point for the 'ingest' directory in
docker runtimes which more closely matches a typical setup using other
runtimes. This allows the relative location of fauna ('path_to_fauna')
to be the same for a typical ambient setup as for docker.

See <nextstrain/cli#371 (comment)>
for more detail on mount points for the docker runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants