Piscem v0.7.0
This release of piscem
adds the ability to index decoy sequencing using the "distinguishing flanking k-mer" methodology described in Hjörleifsson and Sullivan et al.1. This variant of considering decoy sequences that is optimized to work with pseudoalignment and pseudoalignment-like approaches where alignment scores are unavailable (unlike the approach of 2, which is designed to work with selective-alignment).
The implementation in piscem
adopts the terminology of "poison" k-mers — that is, the decoy sequence is used to create a separate table of poison k-mers whose presence will cause a read to be discarded, rather than to map to some target in the index. Poison k-mers are simply distinguishing flanking k-mers that belong to some decoy sequence, and hence their presence in a mapping should "poison" the mapping (i.e. lead to it being discarded).
To build a decoy-aware index, one simply passes the --decoy-paths
argument to piscem build
. This accepts a ,
separated list of FASTA files that will be used to generate the poison k-mer set. This will create a separate data structure (the poison table) that will be used to filter fragments that are potentially mapped spuriously to the index.
Likewise, when performing mapping, if a poison table has been built, it will be used by default. However, you can pass the --no-poison
flag to map-bulk
and map-sc
to avoid considering poison k-mers, even if the index was constructed with a poison table.