-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPAD/GPI export from chado #276
Comments
when we do this, we should also try to include PRO Harold: Our GPI file here will have samples of how we associate them. Here are a couple of examples They are treated as separate identifiers (hence the PR in column 1), but then map to the mgi gene (MGI:MGI:)) Here’s one for an isoform: PR Q9ESQ8-2 mNPVF/iso:m2 pro-FMRFamide-related neuropeptide VF isoform m2 (mouse) protein taxon:10090 MGI:MGI:1926488 UniProtKB:Q9ESQ8-2 Here’s one for a modified form (not of the example above). PR 000030074 mLNP/Phos:1 protein lunapark phosphorylated 1 (mouse) protein taxon:10090 MGI:MGI:1918115 |
upping to medium priority because GO has a timeline at last:
proposed spec: https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md |
Update: GO now has a ticket angling for comments on the GPAD/GPI spec: geneontology/go-annotation#2864 ... so now seems like a good time for @kimrutherford to take a look and make sure there won't be any problems producing those files. |
I think it won't be a problem. The only part that will require a bit of care (if I'm reading it correctly) is the mapping from GO evidence codes to ECO. |
Yeah. They've tried to include cross-references to GO codes in ECO, but IIRC there have been a few where mapping wasn't totally straightforward. Try looking at the "xref: GOECO:" lines in ECO and see how much that covers. |
It might be worth paying attention to this ticket - evidenceontology/evidenceontology#251 |
Chris also recommends this: https://github.com/evidenceontology/evidenceontology/wiki/Gene-Ontology-GafEcoMapping |
or the "derived" version might be better so we don't "have to walk up the ECO graph yourself when going from ECO -> GAF" ... although I think we're mostly concerned about going from GAF -> ECO https://github.com/evidenceontology/evidenceontology/blob/master/gaf-eco-mapping-derived.txt |
Returning to this ticket again now that GO is seeking sign-off on the spec (linked above) ... there are two main points for us -- one is the relations between terms/IDs and gene products that GO now wants, and the other is submitting extensions with the small set of relations that GO now allows (where we've retained some more specific ones locally). Term-gene product relations for GPAD column 3:
Extension relations (also noted for GAF in #744):
|
Thanks Midori. I'm very glad you have a handle on that.
It should be OK. I don't recognise "if_descendant_of" though. |
Don't worry, as long as it's clear how the file should come out. I cribbed "if_descendant_of" from website display config (it's not a relation that would end up in any output files). |
In the GPI file do we need to put anything in columns 7 to 11? They''re all optional. |
As a first step I've added the code to write a GPI file during the nightly update. That's the easy part of GPI/GPAD. Here's a sample of the output:
I've left those columns empty for now. I'll chip away at implementing GPAD writing. |
Great start; thanks! It does need more tweaking (sorry to bear bad news). This is probably the most important point: We might as well just put the gene name in both columns 2 and 3. It's redundant for us (and SGD) but I think it's the best workaround we've got.
Unfortunately, it's not quite that simple.
|
I've made that config and code change. It looks look only there are only one term and two annotations where it makes a difference. GO:0042788 polysomal ribosome: https://www.pombase.org/term/GO:0042788 |
Changed to interesting_isa_parents to be more accurate. Also add a new field "all_interesting_parents" which contains the interesting parent and the relation to get to that parent. Refs pombase/pombase-chado#276
Change to interesting_isa_parents to reflex changes in JSON generation code. Refs pombase/pombase-chado#276
Thanks for the suggestion. I've done that. Perhaps we're a bit ahead of the game? I hope the GPAD/GPI spec doesn't change too much. :-) |
I was thinking about closing this issue then I noticed this comment:
What does that involve? |
While writing this essay, #744 (comment), I realised that the GPAD output has "binds(...)" where it should have "with" entries due to: pombase/website#108 That needs fixing. |
I don't know, but I suspect it might be something we could hive off into its own ticket, to be got round to in a while rather than making the rest of the GPAD/GPI export wait for it. |
I think so too. I'll leave that for Val to summarise in a new issue when she's back from her world tour. |
I'n not sure that anything needs doing especially for PRO. If we use PRO as modified by forms that should be handled automatically? I would close. If anything is required we can open a new ticket once we know what it is. |
OK, thanks Val. I'll do this then close the issue: #276 (comment) Once GO start accepting or validating GPAD/GPI files I'll open new issues for any problems. |
Previously we were moving with values to be binds extensions display in advance. We now move the values only in the term/gene etc. details when they are requested. This allows us to write out the GAF and GAPD files with the "with" value in the conventional column. See: pombase/pombase-chado#276 (comment) Refs pombase/pombase-chado#276
Fixed! (After tonight's load) So I'll close this issue and wait until GO are ready for GPAD/GPI files. For reference, the files are here for now: |
Change "sgf73 (PomBase:SPCC126.04c)" to "PomBase:SPCC126.04c" Refs pombase/pombase-chado#276 Refs pombase/pombase-chado#848
They're now in alphabetical order for consistency between nightly loads. Refs pombase/pombase-chado#276
Export GO annotations and supporting data in GPAD and GPI formats instead of GAF.
File format spec here:
http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format
raise priority when GOC set a deadline - that's when we'll actually have to do it
Original comment by: mah11
The text was updated successfully, but these errors were encountered: