Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation on how to create new models #170

Open
krobison13 opened this issue Sep 28, 2022 · 2 comments
Open

documentation on how to create new models #170

krobison13 opened this issue Sep 28, 2022 · 2 comments

Comments

@krobison13
Copy link

There will be certainly be interest in creating additional models -- e.g. with fewer segments to accommodate larger inserts or other schemes of information embedded in each segment. A tutorial on how to train new models would be very useful

@jonn-smith
Copy link
Collaborator

@krobison13 This is an excellent point. We're in the process of refactoring how models are defined, at which point we'll need a new set of instructions anyway.

In the meantime, the easiest way to do this is to start with an existing model and modify it.

You can list the models available using:

$ longbow model -l
[INFO 2022-10-07 14:50:25    model] Invoked via: longbow model -l
Longbow includes the following models:
Name                                    Version  Description
10x_sc_10x5p_single_none                1.0.0    Model for a single cDNA sequence from the 10x 5' kit
mas_15_sc_10x5p_single_none             2.0.1    The standard MAS-seq 15 array element model.
mas_15_sc_10x3p_single_none             2.0.2    The 3' kit MAS-seq 15 array element model.
mas_15_bulk_10x5p_single_internal       1.0.1    A MAS-seq 15 array element model with a 10 base index just before the 3' adapter for bulk sequencing.
mas_10_sc_10x5p_single_none             2.0.1    The MAS-seq 10 array element model.
mas_15_spatial_slide-seq_single_none    2.0.2    The Slide-seq 15 array element model.
mas_15_bulk_teloprimeV2_single_none     2.0.1    The MAS15 Teloprime V2 indexed array element model.
isoseq_1_sc_10x5p_single_none           1.0.1    Single-cell RNA (without MAS-seq prep).

Then dumping one of them to a file:

$ longbow model -d mas_15_sc_10x5p_single_none
[INFO 2022-10-07 14:51:07    model] Invoked via: longbow model -d mas_15_sc_10x5p_single_none
[INFO 2022-10-07 14:51:08    model] Dumping mas_15_sc_10x5p_single_none: The standard MAS-seq 15 array element model.
[INFO 2022-10-07 14:51:08    model] Dumping dotfile: longbow_model_mas_15_sc_10x5p_single_none.v2.0.1.dot
[INFO 2022-10-07 14:51:08    model] Dumping simple dotfile: longbow_model_mas_15_sc_10x5p_single_none.v2.0.1.simple.dot
[INFO 2022-10-07 14:51:08    model] Dumping json model specification: longbow_model_mas_15_sc_10x5p_single_none.v2.0.1.spec.json
[INFO 2022-10-07 14:51:08    model] Dumping dense transition matrix: longbow_model_mas_15_sc_10x5p_single_none.v2.0.1.dense_transition_matrix.pickle
[INFO 2022-10-07 14:51:08    model] Dumping emission distributions: longbow_model_mas_15_sc_10x5p_single_none.v2.0.1.emission_distributions.txt

Then modifying the resulting longbow_model_mas_15_sc_10x5p_single_none.v2.0.1.spec.json file to have the number of elements (or other characteristics) that you want. Changing the number of elements/segments, for example, is as simple as removing and/or adding MAS adapters to the adapter definitions and adding the corresponding array structure lines to the model structure.

As for training, we are currently using the same weights for all models (we haven't trained them all individually yet). We have empirically found that these weights work well for all default models (admittedly some models would work better with customized weights).

@jamestwebber
Copy link
Member

Is this page (from #195) a sufficient explanation? Any more detail needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants