salmon 1.4.0
salmon 1.4.0 : Thanksgiving release 🦃
Bug fixes
- Fixed a very rare bug whereby, on certain operating systems, under certain types of system load, and with specific versions of the C++ standard library, the
default
standard device would fail to produce a pseudorandom seed and would raise an exception. On these systems, "/dev/urandom" is explicitly substituted for the default random device. Unfortunately, it is not possible / easy to make the appropriate source changes at runtime. So, if you are experiencing this issue (which, again, looks to be exceedingly rare), it may be best to compile from source on the machine causing the issue.
salmon-related changes
- salmon should now compile and run on ARM machines. It has been tested on an AWS aarch64 node (running Ubuntu 20.10), but presumably should work on many ARM machines. It is assumed that NEON intrinsics are available. This support for ARM was made immensely easier by SIMDe. Thanks to @mr-c and @BenLangmead for pointing out SIMDe project and to @mr-c, @lh3 and lead developer of SIMDe @nemequ who all gave useful advice on the initial expansion to ARM support.
alevin-related changes
Support for RAD file creation and the alevin-fry pipeline
-
--rad
/--justAlign
flag : Salmon/alevin 1.4.0 coincides with the initial release of alevin-fry, a flexible and efficient framework for single-cell quantification. Alevin-fry handles barcode-detection and quantification, providing the methods developed as part of alevin, as well as a number of other possibilities. Alevin-fry is computationally efficient, flexible, and very memory efficient, processing single-cell experiments in 2-3GB of memory (see more details in the poster introducing alevin-fry). Moving forward, we plan for alevin-fry to be the primary development platform for new single-cell quantification methods. Nonetheless, alevin-fry currently, and for the forseeable future, will rely on alevin to perform the actual barcode / umi extraction, and mapping of sequencing reads. alevin communicates with alevin-fry via an intermediate binary file called a RAD (Reduced Alignment Data) file. To process data with alevin-fry (documentation available here), you must first map the reads to the reference transcriptome to generate a RAD file. This is done by running alevin as you would normally do, and by additionally passing the flag--rad
or--justAlign
. This flag will tell alevin to just align the reads and to write the appropriate information to a RAD file in the output directory (with a pre-determined name). -
--sketch
/--sketchMode
flag : Alevin learned the--sketch
/--sketchMode
flag. This flag is currently relevant only in RAD mode. In fact, this flag currently implies RAD mode (that is--sketch
is currently the same as--rad --sketch
). The--sketch
flag is meant to prioritize mapping speed at the potential cost of reduced specificity. It turns off selective-alignment and instead maps the reads using a custom implementation of psuedoalignment [1] with structural constraints (PASC). This consists of executing the k-mer collecting part of a pseudoalignment [1] algorithm to collect potentially compatible targets for a fragment, represented by a series of "hits". The targets are then filtered to ensure that the collected hits are consistent in their orientation, and co-linear in their placement on the fragment and reference (these are the enforced structural constraints). This algorithm is distinct from the seeding step of selective alignment or the quasi-mapping algorithm, and prioritizes speed. For an overview of how--sketch
mode affects downstream results, please check out our poster Accurate, efficient, and uncertainty-aware expression quantification of single-cell RNA-seq data.
Other alevin-related changes
-
--noWhitelist
flag : Alevin learned the--noWhitelist
flag. Passing this flag to alevin (in classic mode; this flag has no effect in RAD mode) stops the pipeline after UMI deduplication and quantification. The second-round intelligent whitelisting operation will not be performed. -
generic barcode / umi / read geometry syntax : Alevin learned to support a generic syntax to specify the read sequence that should be used for barcodes, UMIs and the read sequence. The syntax allows one to specify how the pattern corresponding to the barcode, UMI, and read sequence should be pieced together, and the syntax is meant to be intuitive and general. For example, one can specify the 10Xv2 geometry in the following manner using the generic syntax:
--read-geometry 2[1-end] --bc-geometry 1[1-16] --umi-geometry 1[17-26]
This specifies that the "sequence" read (the biological sequence to be aligned) comes from read
2
, and it spans from the first index1
(this syntax used 1-based indexing) until theend
of the read. Likewise, the barcode derives from read1
and occupies positions1-16
, and the UMI comes from read1
and occupies positions17-26
. The syntax can specify multiple ranges, and they will simply be concatenated together to produce the string. For example, one could specify--bc-geometry 1[1-8,16-23]
to designate that the barcode should be taken from the substring in positions 1-8 of read 1 followed by the substring in positions 16-23 of read 1. It is even possible to have the string pieced together across both reads, but that functionality is only available if you are running with--rad
or--sketch
and preparing a RAD file for alevin-fry. If you are running classic alevin, the barcode must reside on a single read. The robust parsing of the flexible geometry syntax is made possible by the cpp-peglib project. -
Alevin learned the ability to annotate output SAM files with the
CB
andUR
tags. If you write a SAM file by running alevin with--writeMappings
, then the resulting SAM file will haveCB
andUR
tags in the alignment records to record the cell barcode and UMI for the fragment. -
A new command-line flag
--noWhitelist
is added to explicitly disable the 'intelligent-whitelist' by alevin. It helps with a still-unresolved issue on HPC running on old centOS, where alevin fails to gain access to virtual memory.
References
[1] Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525-527.