pharokka
v1.4.0 implements pharokka_proteins.py
, allows for the fast annotation of Amino Acid FASTA formatted input proteins with PHROGs, CARD and VFDB.
You can run pharokka_proteins.py
in the following way (most basic command below). By default this will run MMSeqs2 on PHROGs, CARD and VFDB, along with PyHMMER on PHROGs
pharokka_proteins.py -i <inpu protein FASTA file> -d <database directory> -o <output folder> -t <threads>
To run only PHROGs with HMMs (PyHMMER), use --hmm_only
pharokka_proteins.py -i <input protein FASTA file> -d <database directory> -o <output folder> -t <threads> --hmm_only
To run only MMseqs2 on PHROGs, CARD and VFDB, use --mmseqs2_only
pharokka_proteins.py -i <input protein FASTA file> -d <database directory> -o <output folder> -t <threads> --mmseqs2_only
pharokka_proteins.py
generates 3 output files:
_proteins.faa
containing the original protein FASTA file with the header updated with the new PHROG annotation_proteins_full_merged_output.tsv
containing the full information for each protein, similar to_cds_functions.tsv
forpharokka.py
but without the gene prediction information. This will include VFDB and CARD information._proteins_summary_output.tsv
containing the following 5 columns (where ID is name of each inputprotein)
ID length (AA) phrog annot category
e.g.
ID | length | phrog | annot | category |
---|---|---|---|---|
protein_1 | 456 | 3717 | terminase large subunit | head and packaging |
protein_2 | 243 | 315 | endolysin | lysis |
usage: pharokka_proteins.py [-h] [-i INFILE] [-o OUTDIR] [-d DATABASE] [-t THREADS] [-f] [-p PREFIX] [-e EVALUE] [--hmm_only] [--mmseqs2_only] [-V] [--citation]
pharokka_proteins.py: fast phage protein annotation script
options:
-h, --help show this help message and exit
-i INFILE, --infile INFILE
Input proteins file in amino acid fasta format.
-o OUTDIR, --outdir OUTDIR
Directory to write the output to.
-d DATABASE, --database DATABASE
Database directory. If the databases have been installed in the default directory, this is not required. Otherwise specify the path.
-t THREADS, --threads THREADS
Number of threads. Defaults to 1.
-f, --force Overwrites the output directory.
-p PREFIX, --prefix PREFIX
Prefix for output files. This is not required.
-e EVALUE, --evalue EVALUE
E-value threshold for MMseqs2 database PHROGs, VFDB and CARD and pyhmmer PHROGs database search. Defaults to 1E-05.
--hmm_only Runs pyhmmer (HMMs) with PHROGs only, not MMseqs2 with PHROGs, CARD or VFDB.
--mmseqs2_only Runs MMseqs2 with PHROGs, CARD and VFDB only (same as Pharokka v1.3.2 and prior).
-V, --version Print pharokka Version
--citation Print pharokka Citation