Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPCC548.03c from "GO annotation dataset" (plus question, how to query the "with" filed?) #51

Closed
ValWood opened this issue May 23, 2022 · 13 comments
Assignees
Labels
low priority question Further information is requested

Comments

@ValWood
Copy link
Collaborator

ValWood commented May 23, 2022

There are 2 genes represented in pombemine for
SPCC548.03c.01
SPCC548.03c.02

Neither @kimrutherford or I can figure out where these originate (they are isoform IDs but not separate genes).
We can't see where we export these. Are they coming from another source?

thanks
v

@ValWood
Copy link
Collaborator Author

ValWood commented May 23, 2022

I should mention that KIms digging says this comes from:
"GO Annotation data set"

but this is not a gene name in our annotation, it's an isoform identifier. ...and I can't see that we have used it in the PomBase GO annotations.

@ValWood

This comment was marked as resolved.

@ValWood ValWood changed the title SPCC548.03c SPCC548.03c and Q9P3V0 from "GO annotation dataset" May 23, 2022
@ValWood ValWood added the bug Something isn't working label May 23, 2022
@ValWood
Copy link
Collaborator Author

ValWood commented May 23, 2022

@kimrutherford

@kimrutherford
Copy link
Collaborator

There are 2 genes represented in pombemine for
SPCC548.03c.01
SPCC548.03c.02

That should be:
SPCC548.03c.1
SPCC548.03c.2

One case where we use these IDs in a place that is mostly gene IDs is the "with" column of the GAF file. For example:

PomBase SPCC1906.03     wtf19           GO:0005737      PMID:32032353   ISS     PomBase:SPCC548.03c.1   C       wtf meiotic drive antidote Wtf19                protein taxon:4896      20200914      PomBase part_of(CL:0000607)
PomBase SPCC1906.03     wtf19           GO:0005737      PMID:32032353   ISS     PomBase:SPCC548.03c.2   C       wtf meiotic drive antidote Wtf19                protein taxon:4896      20200914      PomBase part_of(CL:0000607)

Maybe they are being misunderstood as gene identifiers in that context?

@ValWood
Copy link
Collaborator Author

ValWood commented May 23, 2022

Right that makes sense.Hmm this is a real edge case. We can infer the location of the different specific versions of this protein (poison and antidote) , and in this case we have specified the isoform(alternative transcript) ID in the with column.

I checked the docs http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/#with-or-from-column-8 to see if this field is restricted to "gene" and it isn't but isoform is not documented:

Screenshot 2022-05-23 at 13 24 05

I suspect if we discussed this the format would be the same as an allele, so it would be
DB:gene_symbol[isoform_symbol]

I will check this with GO

@kimrutherford

This comment was marked as resolved.

@ValWood

This comment was marked as resolved.

@ValWood ValWood changed the title SPCC548.03c and Q9P3V0 from "GO annotation dataset" SPCC548.03c from "GO annotation dataset" May 23, 2022
@ValWood ValWood self-assigned this May 23, 2022
@ValWood
Copy link
Collaborator Author

ValWood commented May 23, 2022

GO ticket
geneontology/helpdesk#394

@danielabutano I have taken this ticket and I 'll report back.
It might be possible to improve how InterMIne handles this field if IDs can be typed.
Note that "protein complex" identifiers can also be used in this field (I am not sure how?)

@ValWood
Copy link
Collaborator Author

ValWood commented May 23, 2022

@danielabutano one thing I did wonder was about the value of adding the genes from the "with" field.
The genes of interest will be loaded already from other routes.

@ValWood
Copy link
Collaborator Author

ValWood commented May 23, 2022

OK I have a response from GO.
geneontology/helpdesk#394
basically it isn't safe to assume that the IDs in the "with" field refer to genes.

But I think that is OK, we don't need to use these "with field" entries in any queries. They are really arbitrary sources of support for an annotation, but they aren't useful for querying, and therefore probably shouldn't be loaded as independent genes (as long as the string is visible (Prefix plus ID) people can look up the sources if they want to validate a specific annotation).

I wanted to see what the with field output looks like but I can't get a query to output this column. Where are the instructions for this?

@ValWood ValWood added question Further information is requested and removed bug Something isn't working labels May 23, 2022
@ValWood ValWood changed the title SPCC548.03c from "GO annotation dataset" SPCC548.03c from "GO annotation dataset" (plus question, how to query the "with" filed?) May 23, 2022
@ValWood ValWood assigned danielabutano and unassigned ValWood May 23, 2022
@danielabutano
Copy link
Member

Hi @ValWood, this query shows the genes created from the with column
image

@danielabutano
Copy link
Member

danielabutano commented May 27, 2022

this query is more precise
image

below the XML if you want to import the it:
<query model="genomic" view="GOEvidence.withText GOEvidence.with.name GOEvidence.with.primaryIdentifier GOEvidence.with.dataSets.name" constraintLogic="(A)" sortOrder=""> <constraint path="GOEvidence.with.dataSets.name" op="NONE OF" code="A"><value>Pombe-gene data set</value><value>cerevisiae-orthologs data set</value><value>human-orthologs data set</value><value>BioGRID interaction data set</value><value>PomBase disease data set</value><value>PomBase phenotypes data set</value></constraint> </query>

@ValWood
Copy link
Collaborator Author

ValWood commented May 27, 2022

Got it. I forgot I need to switch to "GO evidence"

@ValWood ValWood closed this as completed May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low priority question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants