This tool aims to summarize BAM read alignment by pileup or reads at each position in a tabulated way. More convenient as a mpileup format and containing extra information.
Here an example of output you would get with SAPiN
SEQID POS REF QUAL A T G C N INS DEL IUPAC COV COV_ATGC MUT_RAT APOBEC ADAR REGION CODON NUC DESC
HPV42REF 118 C 38.28 0 16 0 1356 0 0 0 0 1374 1372 1.17 1.2 . AATGTCAGGTA CAG 1 gene:ID=gene-1;Name=E6@@mRNA:ID=nbis-rna-1;Parent=gene-1;Name=E6@@exon:ID=nbis-exon-1;Parent=nbis-rna-1;Name=E6@@CDS:ID=cds-1;Parent=nbis-rna-1;Name=E6
Here a description of the different fields
Field | Optional | Type | Description |
---|---|---|---|
SEQID | String | The ID of the landmark used to establish the coordinate system for the current feature. | |
POS | Integer | The reference position, with the 1st base having position 1 | |
REF | Character | The reference base. | |
QUAL | Float | Mean Phred-scaled quality score for the sequenced position. | |
A | Integer | Number of Adenine nucleotide at the position | |
T | Integer | Number of Thymine nucleotide at the position | |
G | Integer | Number of Guanosine nucleotide at the position | |
C | Integer | Number of Cytosine nucleotide at the position | |
N | Integer | Number of Unknown nucleotide at the position | |
INS | Integer | Number of Insertion at the position | |
DEL | Integer | Number of Deletion at the position | |
IUPAC | Integer | Number of IUPAC nucleotide (minus A,T,G,C,N) at the position | |
COV | Integer | Coverage at the position (including INS,DEL,IUPAC) | |
COV_ATGC | Integer | Coverage at the position of A,T,G,C nucleotide only | |
MUT_RAT | Float | Mutation ration (COV_ATGC/nb mutated nuc*100) | |
APOBEC | Float | Mutation ration of C-to-T or G-to-A. Usefull when studying transcriptomes | |
ADAR | Float | Mutation ration of A-to-G or T-to-C. Usefull when studying transcriptomes | |
REGION | STRING | substring of 5 nucleotide on each side. Usefill to make pattern | |
CODON | Only if GFF provided | STRING | substring of codon in phase/frame (/!\ do not take spliced CDS in account). |
NUC | Only if GFF provided | Integer | 1,2 or 3. Indicate in the CODON (previous column) which nucleotide is the one studied at the position |
DESC | Only if GFF provided | STRING | feature type and attributes extracted from the gff at the position |
- python3
- pysam
- gffutils
- matplotlib
They should be automatically installed during SAPiN installation.
pip install git+https://github.com/Juke34/SAPiN.git
or if you do not have administrative rights on your machine
pip install --user git+https://github.com/Juke34/SAPiN.git
Clone the repository:
git clone https://github.com/Juke34/SAPiN.git
Move into the folder:
cd SAPiN/
Install:
python setup.py install
or if you do not have administrative rights on your machine:
python setup.py install --user
Executing:
sapin
or
sapin -h
will display some help.
pip install git+https://github.com/Juke34/SAPiN.git --upgrade
or if you do not have administartive rights on your machine
pip install --user git+https://github.com/Juke34/SAPiN.git --upgrade
Move into the repository folder and execute:
git pull
cd SAPiN/
python setup.py install
pip uninstall sapin
sapin -a t/reference.bam -f t/reference.fasta
advanced:
sapin -a t/reference.bam -f t/reference.fasta -g t/reference_agat.gff3 -cf 1000 -bqf 20 -p
Parameter | Type | Description |
---|---|---|
-a, --ali | String | Path to the BAM input file |
-f, --fasta | String | Path to the reference fasta file used to align the reads against. |
-g, --gff | String | Optional - Path to the reference gff |
-o, --output | String | Path to the tsv output file |
-p, --plot | Boolean | To plot the ratio of mutation per position (sapin_plot.svg by default. If outpout provided output.svg). |
-q, --quiet | Boolean | "Decrease verbosity |
-v, --verbose | Boolean | Increase verbosity |
-z, --gzip | Boolean | Gzip output file |
-s, --shame | Boolean | Suppress the shameless plug |
-cf, --cover_filter | Integer | filter output to report only site with coverage >= |
-bqf, --base_quality_filter | Integer | filter output to report only site with base quality >= (default 0) |
-mqf, --base_quality_filter | Integer | filter output to report only site with mapping quality >= (default 0) |
-mf, --mutation_filter | Integer | filter output to report only site where the mutation ratio >= (default 0) |