Skip to content

Piscem v0.7.0

Compare
Choose a tag to compare
@rob-p rob-p released this 19 Dec 01:46
· 96 commits to main since this release

This release of piscem adds the ability to index decoy sequencing using the "distinguishing flanking k-mer" methodology described in Hjörleifsson and Sullivan et al.1. This variant of considering decoy sequences that is optimized to work with pseudoalignment and pseudoalignment-like approaches where alignment scores are unavailable (unlike the approach of 2, which is designed to work with selective-alignment).

The implementation in piscem adopts the terminology of "poison" k-mers — that is, the decoy sequence is used to create a separate table of poison k-mers whose presence will cause a read to be discarded, rather than to map to some target in the index. Poison k-mers are simply distinguishing flanking k-mers that belong to some decoy sequence, and hence their presence in a mapping should "poison" the mapping (i.e. lead to it being discarded).

To build a decoy-aware index, one simply passes the --decoy-paths argument to piscem build. This accepts a , separated list of FASTA files that will be used to generate the poison k-mer set. This will create a separate data structure (the poison table) that will be used to filter fragments that are potentially mapped spuriously to the index.

Likewise, when performing mapping, if a poison table has been built, it will be used by default. However, you can pass the --no-poison flag to map-bulk and map-sc to avoid considering poison k-mers, even if the index was constructed with a poison table.

  1. Eldjárn Hjörleifsson, Kristján, et al. "Accurate quantification of single-nucleus and single-cell RNA-seq transcripts." bioRxiv (2022): 2022-12.

  2. Srivastava, Avi, et al. "Alignment and mapping methodology influence transcript abundance estimation." Genome biology 21.1 (2020): 1-29.