generated from jonn-smith/python_cli_template
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Refactored models into a hierarchical format * Updated `inspect` to have a BLAST-like display. Now includes option to ingest a second bam file to visualize corrected CBC and UMI. * Sift now verifies that the entirety of the cDNA structure is present. * Added `SC` tag, containing segment cigar strings. * Added `umi_correct` tool. (#172) * Added `sift` documentation * Changed default # threads for `correct` to 1 because of memory use. * Refactoring into a hierarchical HMM constitutes a #minor version bump. * Removed `scsplit`. * Updated `annotate` to only read in one model from a bam file. * Further updates for PR review. * Replaced mas10 test data. * Regenerated test data. * Now only supports one model per bam file. Co-authored-by: Jonn Smith <[email protected]> Co-authored-by: James Webber <[email protected]> Co-authored-by: BumpVersion Action <bumpversion@github-actions> Co-authored-by: Jonn Smith <[email protected]>
- Loading branch information
1 parent
b5cc346
commit 09e7dd2
Showing
113 changed files
with
4,398 additions
and
7,816 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
layout: default | ||
title: correct_umi | ||
description: "Get info on built-in models." | ||
nav_order: 4 | ||
parent: Commands | ||
--- | ||
|
||
# Correct_UMI | ||
|
||
## Description | ||
|
||
Correct UMIs with Set Cover algorithm. | ||
|
||
Corrects all UMIs in the given bam file. | ||
|
||
Algorithm originally developed by Victoric Popic. | ||
|
||
### Data Requirements: | ||
|
||
- Bam file should be aligned and annotated with genes and transcript equivalence classes prior to running. | ||
- It is critical that you give the proper input for the `--pre-extracted` flag. | ||
- If the file has been run through `longbow extract`, use `--pre-extracted` | ||
- If the file has _NOT_ been run through `longbow extract` _DO NOT USE_ `--pre-extracted` | ||
|
||
The following tags are required in the input file: | ||
|
||
- `CB` Cell Barcode | ||
- `JX` (Adjusted UMI) | ||
- `eq` (Equivalence class assignment) | ||
- `XG` (Gene assignment) | ||
- `rq` (Read Quality: [-1.0, 1.0]) | ||
- `JB` (Back / UMI trailing segment Smith-Waterman alignment score) | ||
|
||
## Command help | ||
|
||
```shell | ||
$ longbow correct_umi --help | ||
Usage: longbow correct_umi [OPTIONS] INPUT_BAM | ||
|
||
Correct UMIs with Set Cover algorithm. | ||
|
||
Options: | ||
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG | ||
-l, --umi-length INTEGER Length of the UMI for this sample. [default: 10] | ||
-o, --output-bam PATH Corrected UMI bam output [default: stdout]. | ||
-x, --reject-bam PATH Filtered bam output (failing reads only). | ||
-f, --force Force overwrite of the output files if they exist. | ||
[default: False] | ||
--pre-extracted Whether the input file has been processed with | ||
`longbow extract` [default: False] | ||
--help Show this message and exit. | ||
``` | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
layout: default | ||
title: models | ||
description: "Get info on built-in models." | ||
nav_order: 9 | ||
parent: Commands | ||
--- | ||
|
||
# Models | ||
|
||
## Description | ||
|
||
Get information about built-in Longbow models. | ||
|
||
Can list all built-in models with their version and descriptions or can dump the details of a single model to several files that contain information about that model. | ||
|
||
## Command help | ||
|
||
```shell | ||
$ longbow models --help | ||
Usage: longbow models [OPTIONS] | ||
|
||
Get information about built-in Longbow models. | ||
|
||
Options: | ||
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG | ||
-l, --list-models List the names of all models supported natively by this | ||
version of Longbow. NOTE: This argument is mutually | ||
exclusive with arguments: [dump]. | ||
-d, --dump TEXT Dump the details of a given model. This command | ||
creates a set of files corresponding to the given model | ||
including: json representation, dot file | ||
representations, transmission matrix, state emission | ||
json file. NOTE: This argument is mutually exclusive | ||
with arguments: [list_models]. | ||
--help Show this message and exit. | ||
|
||
``` | ||
|
||
## Examples | ||
|
||
```shell | ||
$ longbow models --list-models | ||
Longbow includes the following models: | ||
|
||
Array models | ||
============ | ||
Name Version Description | ||
mas_15 3.0.0 15-element MAS-ISO-seq array | ||
mas_10 3.0.0 10-element MAS-ISO-seq array | ||
isoseq 3.0.0 PacBio IsoSeq model | ||
|
||
cDNA models | ||
=========== | ||
Name Version Description | ||
sc_10x3p 3.0.0 single-cell 10x 3' kit | ||
sc_10x5p 3.0.0 single-cell 10x 5' kit | ||
bulk_10x5p 3.0.0 bulk 10x 5' kit | ||
bulk_teloprimeV2 3.0.0 Lexogen TeloPrime V2 kit | ||
spatial_slideseq 3.0.0 Slide-seq protocol | ||
Specify a fully combined model via '<array model>+<cDNA model>' syntax, e.g. 'mas_15+sc_10x5p'. | ||
``` | ||
```shell | ||
$ longbow models --dump mas_15+sc_10x5p | ||
[INFO 2022-11-02 20:33:28 models] Generating model: mas_15+sc_10x5p | ||
[INFO 2022-11-02 20:33:29 models] Dumping mas_15+sc_10x5p: 15-element MAS-ISO-seq array, single-cell 10x 5' kit | ||
[INFO 2022-11-02 20:33:29 models] Dumping dotfile: longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.dot | ||
[INFO 2022-11-02 20:33:29 models] Dumping simple dotfile: longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.simple.dot | ||
[INFO 2022-11-02 20:33:29 models] Dumping json model specification: longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.spec.json | ||
[INFO 2022-11-02 20:33:30 models] Dumping dense transition matrix: longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.dense_transition_matrix.pickle | ||
[INFO 2022-11-02 20:33:30 models] Dumping emission distributions: longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.emission_distributions.txt | ||
[INFO 2022-11-02 20:33:30 models] Creating model graph from 1109 states... | ||
[INFO 2022-11-02 20:33:47 models] Rendering model graph now... | ||
[INFO 2022-11-02 20:33:57 models] Writing model graph now to longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.graph.png ... | ||
|
||
$ ls | ||
longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.dense_transition_matrix.pickle longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.graph.png | ||
longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.dot longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.simple.dot | ||
longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.emission_distributions.txt longbow_model-mas_15+sc_10x5p-Av3.0.0_Cv3.0.0.spec.json | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.