-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping hits using a pangenome #183
Comments
For a single reference, you probably want to use this method rather than the annotate script: There will be many unitigs from assemblies that are not in a roary pangenome - those in intergenic regions, those not annotated correctly in the input etc. This isn't something we directly support in pyseer however |
Thank you very much for your quick response. Just one last question to see if my understanding is correct. I have a dataset of ~1500 strains and I have performed 2 GWAS runs on two different subsets of isolates. If I use the annotate script with my pangenome on top as ref and then all the strains participating in the two GWAS runs below as draft sequences, would the mapping of the unitigs be consistent in the two cases so I can calculate the overlapping genes? Do you see any reason for the above approach not to work and if yes do you suggest another approach to tackle my problem? Thank you very much in advance |
Annotation is all done through mapping, so is with respect to whichever reference you use. If you use the same sequences to map to, they will have the same co-ordinate system so can be compared directly. |
Can I also ask if there is a way to map the results of a gene presence absence analysis to the same sequences as I map my unitigs GWAS? |
Hi,
I have executed pyseer and I would like to map my unitigs not to the list of my input files but back to just one fasta/gff file that represents my pangenome.
This file contains representative sequences of all the genes in my dataset (produced using roary). So if I understand things correctly using it as a ref in your annotate script should map all the hits in my results back to a gene. Nonetheless I have many unmapped unitigs in my output.
Am I missing something?
Any help would be much appreciated.
Best
Leonardos
The text was updated successfully, but these errors were encountered: