Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for nested file paths in latch metadata #381

Merged
merged 24 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6b1328b
SnakemakeFileParameter -> SnakemakeFileMetadata
rahuldesai1 Jan 20, 2024
daaa19a
add separate file_metadata field in generate-metadata command
rahuldesai1 Jan 24, 2024
cae1552
use path in config for default field
rahuldesai1 Jan 24, 2024
4036ad9
move reindent function to utils file
rahuldesai1 Jan 24, 2024
6c400e4
recursively upload files in input parameters
rahuldesai1 Jan 24, 2024
6132847
bug fixes
rahuldesai1 Jan 25, 2024
fadbf1a
remove None value in file_metadata
rahuldesai1 Jan 25, 2024
7406718
add guard around input file list size
rahuldesai1 Jan 25, 2024
020c5e3
add defaults back
rahuldesai1 Jan 25, 2024
85d1f5f
more bug fixes
rahuldesai1 Jan 25, 2024
81d9080
cleanup config updating logic
rahuldesai1 Jan 25, 2024
444bedc
cleanup
rahuldesai1 Jan 25, 2024
2736aa7
Merge remote-tracking branch 'origin/main' into rahuldesai1/snakemake…
rahuldesai1 Jan 25, 2024
0a0bf41
update environments doc to remove params
rahuldesai1 Jan 25, 2024
02f1070
udpate snakemake metadata docs
rahuldesai1 Jan 25, 2024
78d7a8a
Merge remote-tracking branch 'origin/main' into rahuldesai1/snakemake…
rahuldesai1 Jan 25, 2024
3563666
resolve PR comments
rahuldesai1 Jan 26, 2024
f0a2c53
add none default check to metadata post-init
rahuldesai1 Jan 26, 2024
ef952de
update tutorial to include new metadata generation code
rahuldesai1 Jan 26, 2024
271208b
disable support for lists containing LatchFile/LatchDir
rahuldesai1 Jan 26, 2024
374be13
disable support for lists containing LatchFile/LatchDir in generate-m…
rahuldesai1 Jan 26, 2024
c3ef8c0
resolve pr comments
rahuldesai1 Jan 26, 2024
3aafd6c
add type checking for Snakemake metadata defaults
rahuldesai1 Jan 27, 2024
230af84
cleanup
rahuldesai1 Jan 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/source/assets/snakemake/metadata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 6 additions & 28 deletions docs/source/snakemake/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,7 @@ SnakemakeMetadata(
use_conda=True,
use_container=True,
),
parameters={
"samples" : SnakemakeFileParameter(
display_name="Sample Input Directory",
description="A directory full of FastQ files",
type=LatchDir,
path=Path("data/samples"),
),
"ref_genome" : SnakemakeFileParameter(
display_name="Indexed Reference Genome",
description="A directory with a reference Fasta file and the 6 index files produced from `bwa index`",
type=LatchDir,
path=Path("genome"),
),
},
...
)
```

Expand All @@ -70,24 +57,15 @@ SnakemakeMetadata(
author=LatchAuthor(
name="latchbio",
),
env_config=EnvironmentConfig(
use_conda=False,
use_container=True,
),
docker_metadata=DockerMetadata(
username="user0",
secret_name="LATCH_SECRET_NAME",
),
parameters={
"samples" : SnakemakeFileParameter(
display_name="Sample Input Directory",
description="A directory full of FastQ files",
type=LatchDir,
path=Path("data/samples"),
),
"ref_genome" : SnakemakeFileParameter(
display_name="Indexed Reference Genome",
description="A directory with a reference Fasta file and the 6 index files produced from `bwa index`",
type=LatchDir,
path=Path("genome"),
),
},
...
)
```

Expand Down
248 changes: 80 additions & 168 deletions docs/source/snakemake/metadata.md
Original file line number Diff line number Diff line change
@@ -1,211 +1,123 @@
# Metadata

The Snakemake framework was designed to allow developers to both define and execute their workflows. This often means that the workflow parameters are sometimes ill-defined and scattered throughout the project as configuration values, static values in the `Snakefile` or command line flags.
The Snakemake framework was designed to allow developers to both define and execute their workflows. This often means that the workflow parameters are sometimes ill-defined and scattered throughout the project as configuration values, static values in the `Snakefile`, or command line flags.

To construct a graphical interface from a snakemake workflow, the file parameters need to be explicitly identified and defined so that they can be presented to scientists to be filled out through a web application.
To construct a graphical interface from a Snakemake workflow, the file parameters need to be explicitly identified and defined so that they can be presented to scientists through a web application.

The `latch_metadata.py` file holds these parameter definitions, along with any styling or cosmetic modifications the developer wishes to make to each parameter.
The `latch_metadata` folder holds these parameter definitions.

To generate a `latch_metadata.py` file, type:
To generate Latch metadata from a config file, type:

```console
latch generate-metadata <path_to_config.yaml>
```

The command automatically parses the existing `config.yaml` file in the Snakemake repository, and creates a Python parameters file.
The command automatically parses the existing `config.yaml` file in the Snakemake repository and creates a Python parameters file. After running the command, inspect the generated files to verify that the parameter types and file paths are what the workflow expects.

#### Examples
## Example

Below is an example `config.yaml` file from the [rna-seq-star-deseq2 workflow](https://github.com/snakemake-workflows/rna-seq-star-deseq2) from Snakemake workflow catalog.
Below is an example `config.yaml` file and corresponding latch metadata.

`config.yaml`

```yaml
# path or URL to sample sheet (TSV format, columns: sample, condition, ...)
samples: config/samples.tsv
# path or URL to sequencing unit sheet (TSV format, columns: sample, unit, fq1, fq2)
# Units are technical replicates (e.g. lanes, or resequencing of the same biological
# sample).
units: config/units.tsv

ref:
# Ensembl species name
species: homo_sapiens
# Ensembl release (make sure to take one where snpeff data is available, check 'snpEff databases' output)
release: 100
# Genome build
build: GRCh38

trimming:
# If you activate trimming by setting this to `True`, you will have to
# specify the respective cutadapt adapter trimming flag for each unit
# in the `units.tsv` file's `adapters` column
activate: False

pca:
activate: True
# Per default, a separate PCA plot is generated for each of the
# `variables_of_interest` and the `batch_effects`, coloring according to
# that variables groups.
# If you want PCA plots for further columns in the samples.tsv sheet, you
# can request them under labels as a list, for example:
# - relatively_uninteresting_variable_X
# - possible_batch_effect_Y
labels: ""

diffexp:
# variables for whome you are interested in whether they have an effect on
# expression levels
variables_of_interest:
treatment_1:
# any fold change will be relative to this factor level
base_level: B
treatment_2:
# any fold change will be relative to this factor level
base_level: C
# variables whose effect you want to model to separate them from your
# variables_of_interest
batch_effects:
- jointly_handled
# contrasts for the deseq2 results method to determine fold changes
contrasts:
A-vs-B_treatment_1:
# must be one of the variables_of_interest, for details see:
# https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#contrasts
variable_of_interest: treatment_1
# must be a level present in the variable_of_interest that is not the
# base_level specified above
level_of_interest: A
# The default model includes all interactions among variables_of_interest
# and batch_effects added on. For the example above this implicitly is:
# model: ~jointly_handled + treatment_1 * treatment_2
# For the default model to be used, simply specify an empty `model: ""` below.
# If you want to introduce different assumptions into your model, you can
# specify a different model to use, for example skipping the interaction:
# model: ~jointly_handled + treatment_1 + treatment_2
model: ""

params:
cutadapt-pe: ""
cutadapt-se: ""
star: ""
```

The Python `latch_metadata.py` generated from the Latch command:
paths:
sample_dir: data/samples/
reference_dir: reference/

```python
from dataclasses import dataclass
import typing

from latch.types.metadata import SnakemakeParameter, SnakemakeFileParameter
from latch.types.file import LatchFile
from latch.types.directory import LatchDir

@dataclass
class ref:
species: str
release: int
build: str


@dataclass
class trimming:
activate: bool
manifest: manifest.tsv

metadata:
threads: 32
num_samples: 2
```

@dataclass
class pca:
activate: bool
labels: str


@dataclass
class treatment_1:
base_level: str


@dataclass
class treatment_2:
base_level: str

The `latch_metadata/` folder generated from the `latch generate-metadata` command contains two files:

@dataclass
class variables_of_interest:
treatment_1: treatment_1
treatment_2: treatment_2
```
├── config.yaml
├── latch_metadata
│   └── __init__.py
│   └── parameters.py
```

```python
# latch_metadata/__init__.py
from latch.types.metadata import SnakemakeMetadata, LatchAuthor
from latch.types.directory import LatchDir

@dataclass
class A_vs_B_treatment_1:
variable_of_interest: str
level_of_interest: str
from .parameters import generated_parameters, file_metadata

SnakemakeMetadata(
output_dir=LatchDir("latch:///your_output_directory"),
display_name="Your Workflow Name",
author=LatchAuthor(
name="Your Name",
),
parameters=generated_parameters,
file_metadata=file_metadata,
)
```

@dataclass
class contrasts:
A_vs_B_treatment_1: A_vs_B_treatment_1
```python
# latch_metadata/parameters.py
from dataclasses import dataclass
import typing

from latch.types.metadata import SnakemakeParameter, SnakemakeFileParameter, SnakemakeFileMetadata
from latch.types.file import LatchFile
from latch.types.directory import LatchDir

@dataclass
class diffexp:
variables_of_interest: variables_of_interest
batch_effects: typing.List[str]
contrasts: contrasts
model: str
class paths:
sample_dir: LatchDir
reference_dir: LatchDir


@dataclass
class params:
cutadapt_pe: str
cutadapt_se: str
star: str

class metadata:
threads: int
num_samples: int



# Import these into your `__init__.py` file:
#
# from .parameters import generated_parameters
#
generated_parameters = {
'samples': SnakemakeFileParameter(
display_name='samples',
type=LatchFile,
config=True,
'paths': SnakemakeParameter(
display_name='Paths',
type=paths,
),
'units': SnakemakeFileParameter(
display_name='units',
'manifest': SnakemakeParameter(
display_name='Manifest',
type=LatchFile,
config=True,
),
'ref': SnakemakeParameter(
display_name='ref',
type=ref,
default=ref(species='homo_sapiens', release=100, build='GRCh38'),
),
'trimming': SnakemakeParameter(
display_name='trimming',
type=trimming,
default=trimming(activate=False),
),
'pca': SnakemakeParameter(
display_name='pca',
type=pca,
default=pca(activate=True, labels=''),
'metadata': SnakemakeParameter(
display_name='Metadata',
type=metadata,
default=metadata(threads=32, num_samples=2),
),
'diffexp': SnakemakeParameter(
display_name='diffexp',
type=diffexp,
default=diffexp(variables_of_interest=variables_of_interest(treatment_1=treatment_1(base_level='B'), treatment_2=treatment_2(base_level='C')), batch_effects=['jointly_handled'], contrasts=contrasts(A_vs_B_treatment_1=A_vs_B_treatment_1(variable_of_interest='treatment_1', level_of_interest='A')), model=''),
),
'params': SnakemakeParameter(
display_name='params',
type=params,
default=params(cutadapt_pe='', cutadapt_se='', star=''),
}

file_metadata = {
'paths': {
'sample_dir': SnakemakeFileMetadata(
path='data/samples/',
config=True,
),
'reference_dir': SnakemakeFileMetadata(
path='reference/',
config=True,
),
},
'manifest': SnakemakeFileMetadata(
path='manifest.tsv',
config=True,
),
}
```

Once the workflow is registered to Latch, it will receive an interface like below:
The `parameters` field contains all input parameters the Latch Console will expose to scientists before executing the workflow.

The `file_metadata` field specifies metadata about the input files as a `SnakemakeFileMetadata` object. Every input parameter of type `LatchFile` or `LatchDir` must have a corresponding `SnakemakeFileMetadata` in the `file_metadata` field.

After registering the above workflow to Latch, you will see an interface like the one below:

![Snakemake workflow GUI](../assets/snakemake/metadata.png)
Loading
Loading