Get sequences command give me an error #328
-
I'm trying to read a few fasta files which various entries, a few of them have symbols like J or X or N and this is giving me an error, how I can read this kind of files. The code I'm using is:
And the error I'm getting is: During handling of the above exception, another exception occurred: Traceback (most recent call last): Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
1. Read the sequences as string, replace the symbol with an appropriate replacement and create a sequences = {header : ProteinSequence(seq_str.replace("J", "L")) for header, seq_str in fasta_file.items()} 2. Read the sequences as string and create alphabet = LetterAlphabet(ProteinSequence.alphabet.get_symbols() + ["J"])
sequences = {header : GeneralSequence(alphabet, seq_str) for header, seq_str in fasta_file.items()} |
Beta Was this translation helpful? Give feedback.
'J'
is currently not a symbol in the amino acid alphabet. Hence neither aNucleotideSequence
orProteinSequence
can be created from the sequences in your FASTA file. There are two possible solutions to this issue, both using the low-level API ofFastaFile
that returns strings instead ofSequence
objects:1. Read the sequences as string, replace the symbol with an appropriate replacement and create a
ProteinSequence
2. Read the sequences as string and create
GeneralSequence
objects with a custom alphabet