🌲 Annotation of the plastid genome of Engelmann spruce (Picea engelmannii), genotype Se404-851 https://www.ncbi.nlm.nih.gov/nuccore/MK241981
The assembled FASTA file was inputed into GeSeq.
- see Documentation
- GeSeq Settings
- The settings used for this annotation are the exact same as those used for Picea glauca WS77111 chloroplast
- GeSeq generates an OGDRAW .png file and a GenBank .gb file.
The .gb file is converted to a .gff file using EMBOSS Seqret:
- EMBOSS Seqret settings
- The settings used for this conversion are the exact same as those used for Picea glauca WS77111 chloroplast
- Used finalize.sh to rename the annotation IDs
Duplicate annotations made by GeSeq were removed. Most conflicts were due to the use of multiple reference annotations, as the whole taxus was selected. tRNAs with anti-codons annotated were selected over those without them. Further annotation was done using third party tRNA scanners like ARAGORN and tRNAscan. However, no further results were substantiated through their use.
Four genes specifically needed to be manually annotated: rps12, petB, petD, and rpl16. Rps12 is a transpliced gene, while the other three had such short inital exons that GeSeq could not annotate them. Not all mRNAs were successfully annotated by GeSeq, and were consequently manually annotated. In the final annotation, it is demonstrated that all 114 genes found in the other Picea chloroplasts are consered, including the 74 coding regions (CDS), 4 rRNAs, 36 tRNAs, and 15 introns (9 in CDS, 6 in tRNAs).
MUMmer and minidot were used to find inverted repeats that ultimately did not make it into the final annotation.
- See Makefile for more information
BLASTn was used to align Se404-851cp genes to PG29 genes. Every subpar alignment was then analyzed in IGV along with whole chloroplast genome alignments as well as read to assembly alignments. Most discrepancies between PG29cp and Se404-851cp were supported by the reads; those that were not were fixed as indicated by the consensus.
This final GFF annotation was validated using table2asn_GFF:
- Generated Files: Sequin ASN.1 file, discrepancy report, error list, Genbank file
- See Makefile
The .gbf file, generated by table2asn_GFF was fed through OGDraw independently
- OGDraw settings
- The settings used for this drawing are the exact same as those used for Picea glauca WS77111 chloroplast
- OGDraw files: see ogdraw
Lin D, Coombe L, Jackman SD, Gagalova KK, Warren RL, Hammond SA, McDonald H, Kirk H, Pandoh P, Zhao Y, Moore RA, Mungall AJ, Ritland C, Doerksen T, Jaquish B, Bousquet J, Jones SJM, Bohlmann J, Birol I. 2019. Complete Chloroplast Genome Sequence of an Engelmann Spruce (Picea engelmannii, Genotype Se404-851) from Western Canada. Microbiol Resour Announc 8:e00382-19. doi: 10.1128/MRA.00382-19.