Sources of Error

Leaky Alignments

Alignment Leakage

Source of False Positives. If a known virus A is present at a high read-count, then things like sequencing error, biological artifacts and mis-mapping will result in a small fraction of reads being assigned to a related, but not the ideal sequence (B and C). The distance (in nt- or aa-substitutions) from the virus in the sequencing library may be in the "known range" to virus A, and in the unknown range to virus B and C.

Often this falls well below the level of "noise", but in libraries with high viral read-counts (10,000s), this may lead to an appreciable signal in neighboring viruses.

The best way to mitigate this issue is to consider a higher level of the hierarchy for locating novel viruses. For instance instead of asking "Find me a novel PCV2-related sequence". You first ask "Find a novel Circovirus sequence." and then sub-set those results to "Which of those libraries is the best-available match PCV2."

PCV1 and PCV2 Leak

Scattered Alignments

Alignment Scatter

Source of False Negatives. Alignment scatter occurs when a library-sequence is "between" the sequences from two operational taxonomic units (OTU). When providing summary statistics at the level of OTU/Family, this in effect "dilutes" divergent reads across categories. A virus may be sufficiently abundant to warrant further investigation yet be reported as rare/incomplete. An interesting but probably hard to detect case would be chimeric sequences.

Overview

Architecture and Pipeline

Raw Data

Serratus Explorer (serratus.io)

Usage

Running Serratus
- Serratus-Lite, local
Finding Novel Viruses (tutorials)
Papers using Serratus
Containers
Summarizer usage
Cloud Budgeting
Serratus SQL Database Management
Data Policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly