-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPAD/GPI 2.0 Specifications - Request for comments #2864
Comments
Hi I have just done a small test and I can't get the isoform gpa file from EBI to be included into Cytoscape analysis, whereas the isoform gaf file works fine, with all GO annotations that are specific to isoform 4 associated with the P15692 node. For example the GO term basophil chemotaxis has no child terms and is associated with P15692-4, but in the cytoscape analysis it is associated with the P15692 node not P15692-4. Please can you ensure that Cytoscape is able to use the GPAD file before you get rid of the gaf file and also consider this isoform implications. Or provide very clear information for non-bioinformaticians how to use the gpad file in Cytoscape. Thanks Ruth |
Is the plan to fully retire the GAF? It would be good if we could maintain GAFs (even if it is just computed from GPAD/GPIs we submit - it's just such a wonderfully simple format that most biologists can manipulate it to their satisfaction and joy). |
We will definitely be continuing to maintain and provide GAF for our users. We're well aware that many applications and tools use GAF and it will take some time for these tools to transition, if they transition at all. However, internally, we would like to move towards GPAD/GPI as our exchange file formats, as these new formats are more robust (i.e. IDs, not text) and provide us with a mechanism to exchange additional metadata that will be critical for importing annotations into Noctua. |
GPAD
GPI
|
Noting this exchange here: geneontology/helpdesk#252 @kltm - I want to confirm what we will need in GPAD. Currently the specs are not using underscores in term relation labels. |
@tberardini - thanks for taking a careful look. Answers in-line below.
We are leaving selection of the default BP up to each individual curation group, as curation practices may differ by group. Ontologically speaking, the default gp2BP relation would be 'acts upstream of or within'
Yes, DOIs are definitely an acceptable reference entry.
ORCIDs are only to be used in the Annotation Property field. The Assigned by field will use an entry from the groups.yaml
Yes, I'll add that clarification.
Yes, it will. |
Thanks for the clarification, @vanaukenk. |
@vanaukenk Re: #2864 (comment) |
@kltm - Yes, apologies, mixing two things here. I'll make a separate ticket for the proposal to add the full set of gp2term relations to the GAF and what we will use for that (CURIE vs string). Thx. |
@vanaukenk If you want to touch bases, we can do that--we ended up getting pretty confused on Friday as we tried to track through the helpdesk issue. I think we're all sorted, but if you have any questions on the GAF or GPAD spec it couldn't hurt to talk real fast. |
Just noting - as I see that there is a comment under the SO table in the specs, that in our current GAF, we have annotations to unmapped loci (these are super old and we don't make them now). We use the SO term 'gene' for these. We also output annotations for SO 'pseudogene' - we try to remove these annotations as we go, but sometimes there a few present. I don't see the benefit of releasing these annotations to unmapped/pseudogene. So perhaps the GPAD and future GAF specs should exclude these entities. (DBs could keep them, just not release them - they are sometimes useful for mapping genes) |
On the 2020-04-02 software call, we discussed two issues:
For 1, we decided to represent the one:many relation by including all gene and/or protein names as synonyms, and including all genes as parents (according to the proposal for 2). For 2, we felt that the meaning of 'Parent Object ID' could potentially be confusing depending upon what the entry represents, so we decided to split this column out into two: 'Encoded By' to capture a gene ID, and 'Parent Protein' to capture the gene-centric reference proteome accession for protein isoforms or peptides derived from proteolytically processed proteins. |
@ukemi - can you provide examples of where MGI has used the other entity types for tomorrow's annotation conference call? Thx. |
@vanaukenk - can I check that pipe separating PMIDs and MOD ref IDs in GPAD is ok? I thought that moves were afoot to just use the PMID. |
Yes, it is okay to pipe-separate a PMID and MOD paper id as long as they're referring to the same publication. You're remembering correctly that at one point we talked about just using PMIDs, but since there is usually more information about the paper, especially wrt curation, at the MOD we decided to keep the MOD id, and link out from AmiGO. |
Sorry @vanaukenk another Q: |
I will double-check with @kltm https://github.com/geneontology/go-site/blob/master/metadata/groups.yaml One should be the definitive source of accepted db prefixes and names for the purposes of the annotation files, and we can add that information to the spec. We actually use WB in both places for our WB GAF. |
Thanks @vanaukenk I am guessing that we'd just stick with 'FlyBase' here. |
@vanaukenk @kltm is this resolved - it seems to me that we shoould be using underscores in all gp2term rels. e.g. 'acts_upstream_of_or_within_positive_effect' rather than 'acts upstream of or within, positive effect' or 'acts upstream of or within positive effect' and 'contributes_to' rather than 'contributes to' |
@hattrill |
....for GAF2.2? |
From the 2020-07-28 annotation call, we are asking people who submit annotation files to the GOC to please sign off on the GPAD/GPI 2.0 specs on this ticket. https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md Signing off means that you've reviewed the specs and have raised and resolved any questions you might have. The deadline for signing off on the specs is Tuesday, September 1st. @mah11 |
At this point, I think all annotations are required to be traceable to an entity in users.yaml. I think a good first pass would be to populate users.yaml with all current and historical curators, using a GOC:xyz identifier for those that pre-date ORCID. For things like bots or annotations that really have an unknown history, I suppose a grouping entity could be created that still marks the annotation or alteration as automated or unknown from SGD. |
Why dont we use GOC:curators, that already exists ? |
Hi is there a list somewhere of the users with no ORCID accounts, sorry if I missed it. I would like to check it in case someone in my group is listed and I can help add the ORCID |
Pre-P2GO, we had no mechanism for individual attribution. In the GOA DB, the generic FlyBase curator "FlyBase GOcur" is used for these annotations. |
You can check the users.yaml file to see if any of the UCL curators are listed but don't have an orcid. |
The E. coli group is signing off on the new specs for GAF 2.2 and GPAD/GPI 2.0. |
RGD signs off on the specs. |
thanks Kimberly just added a few ORCIDs to the file. Not sure that you want the MSc student names as their annotations are all either checked and approved by me or Shirin so I don't think their names will be listed with the annotations available via Protein2GO |
Thanks @RLovering If the MSc students' names will never be associated with production annotations in Protein2GO, I don't see that it's necessary to have them in the users.yaml file. That said, if the students ever start making annotations that don't need to be checked in Protein2GO, or if they'd ever like to make GO-CAMs, we'll need to add them to the users.yaml file. |
Sorry for not commenting on this earlier. Because of community curation, we have annotations from 360 users. This number is increases by 40-50 each year. Keeping users.yaml coordinated with our database is going to be a bit of a maintenance hassle. Would it be possible for us to provide a "users-pombase.yaml" along side our GPAD/GPI files? That would allow us to automate the updating of the users file. |
@vanaukenk @kltm Our GAF, and now our new GPAD, contain annotations from upstream sources- UniProt, RNACentral, GOC, etc. obtained from EBI FTP. Will these annotations soon come to us with specific contributor-ids for GPAD column 12? Should we leave our "outside sources" col 12 blank until we have this info? Alternatively, I could see making and assigning a "UniProt Curators" "RNACurators", etc. id as discussed above, or would we default to GOC:curators? |
@suzialeksander |
@vanaukenk, sounds like leaving it blank will be the solution. We do attribute outside annotations to their sources on our site, but we can get this from column 10. We won't need more detail than that. Thanks! |
Has there been any progress on this? We'd like to test our GPAD/GPI files to make sure we're ready to switch away from GAF format. Is there a GitHub issue we could keep an eye on? Thanks. |
note: this might be a closable ticket as at least one source is putting out files labelled !gpi-version: 2.0; col 5 at least isn't in the right format. Header lacks dates, and col 9 isn't used when it could be. ftp://ftp.ebi.ac.uk/pub/contrib/goa/gp_information.559292_sgd.v2.gz
|
I think this was more about making sure people knew what we output; looking at http://release.geneontology.org/2022-06-15/annotations/index.html it looks like we do export GAF2.2, looking for example at cgd.gaf. Thanks for checking! |
@pgaudet GAF2.2 work was finished but the GPAD/GPI updates have not been completed/finalized. @vanaukenk think that this was never signed off - we are waitig for the final spec to produce the FB GPAD/GPI. |
Actually the specs are missing one point: Column 9 xrefs is missing recommendations for RNA and complexes: |
Since this is still open, there has been a request to have entities in the GPAD col 11 (annotation extension) match the database providing the GPAD, if applicable. See geneontology/helpdesk#440 |
Closing in favor of new round of comments on updated draft in #4684. |
Starting on Tuesday, March 10th, we will be requesting review and comments on the proposed GPAD/GPI 2.0 file format specifications.
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md
Please add any comments or questions you have about the specs to this ticket by Tuesday, March 31st.
Thank you.
The text was updated successfully, but these errors were encountered: