Releases: MikkelSchubert/adapterremoval
AdapterRemoval v2.3.4
This release adds a new couple of command-line options for handling non-ACGTN
bases in FASTQ data and back-ports a few minor fixes from the development
branch.
Added
- Added support for converting Uracils (U) in input data to Thymine (T) via the
--convert-uracils
flag. - Added support for replacing IUPAC-encoded degenerate bases with Ns via the
--mask-degenerate-bases
flag. - Added DESTDIR support to
make install
.
Fixed
- Improved progress timer accuracy, so updates occur closer to every 1M reads.
Changed
- Minor improvements to
--help
text and documentation.
AdapterRemoval v3.0.0-alpha2
This is the second alpha release of AdapterRemoval v3. It is the intention that
a third alpha release, or the final 3.0 release, will follow within the next
couple of months.
As with alpha 1, changes that affect how AdapterRemoval is used (e.g. by
removing options) or that result in different output compared to AdapterRemoval
v2 are marked with the label "[BREAKING]".
In addition to changes listed below, this release includes increased throughput
thanks to improved parallelization of various steps in internal pipeline,
support for AVX512 and general improvements to the SIMD alignment algorithms,
loop unrolling of non-SIMD alignments to significantly increase throughput when
SIMD is not available, and a significant decrease in the number of allocations
to decrease overhead.
This release requires a compiler with support for c++17 and libdeflate is now a
mandatory dependency.
Draft documentation is available here and a pre-compiled binary for x86-64
Linux systems is attached below.
Added
- Added support for converting (U)racils in input data to T(hymine) via the
--convert-uracils
flag. - Added support for replacing IUPAC-encoded degenerate bases with Ns via the
--mask-degenerate-bases
flag. - Added support for writing output in SAM/BAM formats, with optional
user-supplied read-group information. - Added support for alignments using AVX512 instructions. AVX512 support only
available when AdapterRemoval is compiled with GCC v11+ or Clang v8+. - Added support selecting output file formats via the file extension and via
the--out-format
option. A corresponding option,--stdout-format
was
added to select the format for data written to STDOUT. - Added support for reading from STDIN or writing to STDOUT when '-' is used as
the filename, as an alternative to using/dev/stdin
or/dev/stdout
. - Added dedicated threads solely for writing output data. This allows compute
threads to work at full capacity, as long as the destination can consume
written data fast enough. This may result in CPU utilization exceeding
--threads
by a couple of percent. - Added support for setting DESTDIR when running
make install
. - Added
--licenses
flag for displaying licenses of 3rd party code used by /
incorporated into AdapterRemoval. - Added
--simd
option allowing the user to select the specific SIMD
instruction set they wish to use. - Added
Containerfile
for building static binaries using alpine/musl.
Changed
- [BREAKING] Changed the default
--mm
/--mismatch-rate
from 1/3 to 1/6,
in order to decrease the false positive rate, in particular for read merging. - [BREAKING] Default to writing gzip-compressed FASTQ files; output written
to STDOUT is uncompressed by default. - [BREAKING] Discarded reads are no longer saved by default.
- [BREAKING] Output files for discarded reads and singleton (orphan)
paired-end reads are only created if filtering is enabled. - [BREAKING] The
--basename
/--out-prefix
no longer defaults to
your_output
. Instead the user is required to set at least one--out-*
option. - [BREAKING] Merged
--identify-adapters
and--report-only
commands. The
adapter sequence is presently only reported in the HTML report, but will be
added to the JSON report following some planned changes. - [BREAKING] Reverted
--min-complexity
being enabled by default. - Increased the default
--threads
value to 2. - A number of command-line options were renamed for consistency; use of the old
names is still supported, but will trigger a warning message. - Re-organized compression: level 1 is streamed using isa-l, while levels 2-13
correspond to libdeflate levels 1 to 12. - Changed the default compression level to 5 on the new scale (libdeflate level
4); this results in a ~40% increase in throughput at the cost of roughly ~3%
larger output files. - Setting an
--out-*
option in demultiplexing mode overrides the basename /
prefix for that specific output type. - Add smoothing to GC values calculated for the GC content curve, to account
for the fact that possible GC% values are unevenly distributed depending on
the read length.
Removed
The following changes are all [BREAKING] as described above:
- Removed support for original merging algorithm has been removed. The
--merge-strategy additive
method produces very similar, but slightly more
conservative scores. - Removed the ability to randomly sample a base if no best base could be
selected in case of mismatches. Such bases are now changed toN
, while both
methods assign a Phred score of 0 (!
).
AdapterRemoval v3.0.0-alpha1
This is the first alpha release of AdapterRemoval v3. This is a major revision
of AdapterRemoval, with the goals of simplify usage by picking a sensible set of
default settings, adding new features to handle a wider range of data, providing
human/machine readable reports, and improving overall throughput.
This release features a number of breaking changes compared to AdapterRemoval v2
and it is therefore recommended that you carefully read the list of changes
below. Changes that affect how AdapterRemoval is used (e.g. by removing options)
or that result in different output compared to AdapterRemoval v2 are marked with
the label "[BREAKING]".
This is an alpha release; not all planned features are complete (more QC reports
are planned among other things), additional optimizations will be attempted, and
documentation is still needs to be expanded further before the final release.
Feedback is very welcome in the mean time.
Draft documentation is available here and a pre-compiled binary for x86-64 Linux systems is attached below.
Added
- Reports are now available in JSON format for easy parsing and in HTML format
for human consumption. These replace the old--settings
file. - AVX2 enabled alignment algorithm for a significant performance boost (YMMV).
- Added support for detecting supported CPU extensions (SSE/AVX) at runtime.
- Support for combining output by simply by specifying the same filename for for
multiple outputs types, e.g.--output1 file.fq --output2 file.fq
will for
example produce interleaved output. - Added handling for
/dev/null
as a "magic" output filename. Read-types
writing to this exact path will be discarded early in the pipeline, saving
time previously spent processing, compressing, and writing FASTQ reads. - Added read complexity filter inspired by [fastp].
- Added the ability to only processes the first
N
reads/read pairs via the
newly added--head N
command-line option. - Added estimation of duplication rates based on the [FastQC] algorithm.
- Automatic detection of mate separators based on the first chunk of reads
processed. The--mate-separator
is therefore only required in cases where
the results are ambiguous. - Automatic gzip compression of output files with a
.gz
extension. This makes
it possible to compress only a subset of files and removes the need for the
--gzip
option when manually specifying output files. - Added options
--prefix-read1
,--prefix-read2
, and--prefix-merged
for
adding custom prefixes to the names of FASTQ reads.
Changed
- [BREAKING] Default adapters have been changed to the [recommended Illumina
sequences], equivalent to the first 33 bp of the adapter sequences used by
AdapterRemoval v2. This makes the default settings more generally applicable. - [BREAKING] The trimming options
--trimwindows
,--trimns
,
--trimqualities
, and--minquality
have been deprecated in favor of a new
the modified Mott's algorithm, which is enabled by default. The trimming
algorithm used may be changed using new--trim-strategy
option. - [BREAKING] Merging now defaults to using the conservative algorithm,
meaning that matching quality scores are assignedQ_match = max(Q_a, Q_b)
instead ofQ_match ~= Q_a + Q_b
, and that same-quality mismatches are
assigned 'N' instead of one being picked at random. Motivated in part by
doi:10.1186/s12859-018-2579-2
. This can be changed using--merge-strategy
. - The
--merge
option no longer has any effect when processing SE data;
previously this option would treat reads with at--minalignmentlength
adapter as pseudo-merged reads. - [BREAKING] Merged reads are no longer given a
M_
name prefix and merged
reads that have been trimmed after merging are no longer given anMT_
name
prefix. Instead, see the new option--prefix-merged
. - [BREAKING] Default filenames have all been revised and now include proper
extensions to indicate the format. - [BREAKING] The executable is now named
adapterremoval3
. This was done to
allow v3 to coexist with AdapterRemoval v2 and to prevent accidental use of
the wrong version. - [BREAKING] Changed the default --maxns value from 1000 to "infinite"
--gzip
now defaults to compressing independent blocks of 64kb data using
libdeflate
. This significantly improves throughput in both single- and
(especially) multi-threaded mode, but may be incompatible with a few programs.
Compression levels of 3 and below use isa-l for compression and provides a
more universally compatible output.- The term "merging" is now used consistently instead of "collapsing", including
for default output filenames. Options have been renamed, but old option names
continue to work (except for--outputcollapsedtruncated
). - Improvements to alignment algorithm in order to terminate early if possible.
- Logging is now done more consistently and exposes options to increase or
decrease the amount of messages printed (debug, info, warning, errors).
Removed
The following changes are all [BREAKING] as described above:
- The
--outputcollapsedtruncated
has been removed and all merged reads
(whether quality trimmed or not) are simply written to--outputmerged
. - The
--qualitybase-output
has been removed. Output is now always Phred+33. - The
--combined-output
option has been removed in favor of allowing arbitrary
merging of output files (see above). - The
--settings
option has been replaced by--out-json
and--out-html
for
machine and human readable reports, respectively. - Removed support for guessing the intended command-line argument based on
prefixes. I.e.--th
will no longer be accepted for--threads
. Due to the
number of options added, removed, and renamed, this is no longer reliable. - The deprecated
--pcr1
and--pcr2
options have been removed. - Dropped undocumented support for '.' as equivalent to 'N' in FASTQ reads.
- Support for reading and writing of bzip2 files has been removed.
AdapterRemoval v2.3.3
- Updated Catch2 to fix compilation with glibc 2.34, courtesy of loganrosen.
AdapterRemoval v2.3.2
- Improved error messages when AdapterRemoval failed to open or write FASTQ
files (issue #42). - Fixed build on some architectures. Patch courtesy of Andreas Tille/the Debian
build team. - Fixed display of max Phred scores in FASTQ validation error messages.
- Removed benchmarking scripts which were included in the repo for the sake of
making Schubert et al. 2016 reproducible. This is no longer relevant. - Use 'install' in the Makefile; patch courtesy of Eric DEVEAUD.
- Added --collapse-deterministic to .settings file.
- Fixed --minadapteroverlap being misapplied in PE mode.
- Added --collapse-conservatively merge algorithm based on FASTQ-join. See
the man-page for more information
AdapterRemoval v2.3.1
- Added --preserve5p option. This option prevents AdapterRemoval from trimming
the 5p of reads when the --trimqualities, --trimns, and --trimwindows options
are used. Neither end of collapsed reads are trimmed when this option is used. - Fixed Ns being miscounted as As when constructing consensus adapter sequences
using --identify-adapters.
AdapterRemoval v2.3.0
- Fixed --collapse producing slightly different result on 32 bit and 64 bit
architectures. Courtesy of Andreas Tille. - Added support for output files without a basename; to create such output
files, use an empty basename (--basename "") or a basename ending with a
slash (--basename path/). - Added support for managing file handles to allow AdapterRemoval to run
when the the number of output files exceeds the number of file handles, e.g.
when demultiplexing large numbers of samples. - Reworked demultiplexing to improve performance for many paired barcodes.
AdapterRemoval v2.2.4
- Fixed bug in --trim5p N which would AdapterRemoval to abort if N was greater
than the pre-trimmed read length. - Fixed --identify-adapters not respecting the --mate-separator option.
AdapterRemoval v2.2.3
- Added support for trimming reads by a fixed amount: --trim5p N --trim3p N.
Different values may be given for each mate: --trim5p N1 N2. Trimming is
carried out after adapters have been removed and reads have been collapsed,
if enabled, but before quality trimming (Ns and low qualities). - Added option for determistic read merging (--collapse-deterministic). In
this mode AdapterRemoval will set a merged base to 'N' with quality 0 if
the corresponding bases on the two mates differ, and if both have the same
quality score. The default behavior is to select one of the two bases at
random. - Fixed reporting of line numbers in error messages.
- Added conda installation instructions, courtesy of Maxime Borry (maxibor).
- Fixed reading mate 2 adapters specified via --adapter-list. Adapters would
be used in the reverse orientation compared to --adapter2. Courtesy of
Karolis (KarolisM). - Fixed various typos and improved help/error messages.
AdapterRemoval v2.2.2
- Made gzip and bzip2 support mandatory.
- Added support for Intel compilers, courtesy of Kevin Murray (kdmurray91).