Skip to content

Salmon v0.11.0

Compare
Choose a tag to compare
@rob-p rob-p released this 16 Jul 04:35
· 1005 commits to master since this release

Version 0.11.0 introduces further enhancements and bug fixes, and makes some modifications to the default parameters.

Note : Though we provide a pre-compiled binary here, we strongly suggest installing the latest version of salmon through Bioconda (or building via source).

New / enhanced features

  • The sensitivity and accuracy of mapping validation (enabled via the --mappingValidation flag) has been further enhanced. This flag allows salmon to score mappings and validate their quality, both reducing the instances of spurious mapping and improving assignment of reads to the correct transcript in complex mapping situations.

  • Variational Bayesian (VB) optimization is now the default algorithm for the offline phase of salmon. While salmon always uses stochastic collapsed VB inference during the online phase, the default optimization algorithm during the offline phase was Expectation Maximization (EM), and the use of VB optimization had to be explicitly enabled with --useVBOpt. Now, VB optimization is the default. This decision was made as the result of testing that suggested that the VB algorithm is often more accurate, and, specifically, is less likely to give non-negative (even if small) TPM to a non-expressed transcript. One can make salmon use the EM algorithm (reverting to the old behavior) by passing the --useEM flag. Further, for backwards compatibility, salmon still accepts the --useVBOpt flag, but it may be removed in the future. It is worth noting that the VB optimization algorithm employs a prior (which can be set using the --vbPrior flag). The default prior has been tuned for typical use-cases, and should work well. However, if you have a mechanism for validation, it may be worth exploring this option to see if a sparser (smaller) or less-sparse (larger) prior is useful in your setting.

  • The new flag --sigDigits (default value = 3) tells salmon how many significant digits to use when writing out effective lengths and estimated counts (it will still use more digits for TPM). Since more significant digits are not often necessary, this can considerably reduce output size if one is processing many samples. Of course, using this flag, one can always request higher-precision output.

  • The new flag --consensusSlack allows one to modify the behavior of the mapping consensus mechanism. Passing a larger value will allow more "liberal" mapping. This is an advanced flag, and it is not likely that it need to be set by the casual user. The consensus slack is set to 1 by default when mapping validation is enabled, and to 0 otherwise.

Bug fixes

  • This release fixes a bug in mapping validation (i.e. the --validateMapping) flag that could result in a segmentation fault in rare situations.

  • This release fixes a bug that was present when using the built in gene-level aggregation. The quant.genes.sf gene lengths and effective lengths columns are corrected, being generally wrong for multi-isoform genes since 0.6.0 (with an internals-varying permutation of isoforms before the fix being weighted 1, 2, 3, ... times what they should be in the TPM-weighted averaging denominator, resulting in lengths that were too short, possibly even shorter than the shortest isoform). This miscalculation applies only to the lengths and effective lengths; TPM and NumRead columns remain unchanged (and correct). We thank Shawn Cokus ([email protected]), UCLA MCDB/BSCRC for discovering this issue and bringing it to our attention. Note: While we are maintaining the built-in gene-level aggregation code, tximport is, and has been, the recommended way to aggregate transcript-level abundances to the gene-level. It provides several benefits over the built-in methodology, including the ability to derive lengths looking across replicates (rather than processing each replicate individually).