Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"contributes to" in sgd.gaf #252

Closed
murphyte opened this issue Mar 27, 2020 · 7 comments
Closed

"contributes to" in sgd.gaf #252

murphyte opened this issue Mar 27, 2020 · 7 comments

Comments

@murphyte
Copy link

Hi GOA/SGD,

The sgd.gaf download file:
http://current.geneontology.org/annotations/sgd.gaf.gz

Recently added the phrase "contributes to" in column 4, in addition to some other rows using "contributes_to". Should this be standardized, and if so, which way?

Similar for "colocalizes with" vs "colocalizes_with".

Thanks for checking!

-Terence

@srengel
Copy link

srengel commented Mar 27, 2020

The rows from SGD are 'contributes to'. The gaf provided by SGD is consistent on all qualifiers (no underscores). The rows added by GOC that are source=GO_Central seem to be the offenders (using the underscores).

Yes, it should be standardized. The rows added by GOC that are source=GO_Central need to be corrected to use the correct format (no underscore).

@murphyte
Copy link
Author

I'll note that historically the file only contained "contributes_to", including rows from SGD. I didn't dig through revisions, but I think it's a recent change given that our code only just now failed because of the change.

I think all the other files still use "contributes_to", but I didn't check if those are also all sourced from GO_Central, or if there are other sources also using the underscore.

@suzialeksander
Copy link
Collaborator

commenting to keep an eye on this

@kltm
Copy link
Member

kltm commented Mar 27, 2020

@murphyte @srengel @pgaudet @dougli1sqrd
I've been doing a little digging and I wanted to clarify a few things. From the most recent release:

zgrep -c "contributes to" sgd-src.gaf.gz
1278
zgrep -c "contributes_to" sgd-src.gaf.gz
0
zgrep -c "contributes to" sgd.gaf.gz
1037
zgrep -c "contributes_to" sgd.gaf.gz
90
zgrep "contributes_to" sgd.gaf.gz | grep -c IBA 
90

What this means is that all "contributes to" for this release are coming from the SGD source, with the "contributes_to" in the post-GO pipeline product being added from PANTHER.
Apparently owltools is upgrading "contributes to" to "contributes_to", so this issue is not seen in AmiGO.

In December:

zgrep -c "contributes to" sgd-src.gaf.gz
0
zgrep -c "contributes_to" sgd-src.gaf.gz
1961

So it seems that SGD used to use "contributes_to", then switched to "contributes to" within the last few months. According to the GAF spec (http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/#qualifier-column-4), the qualifier field is: "one (or more) of NOT, contributes_to, colocalizes_with".

@pgaudet
Copy link
Contributor

pgaudet commented Mar 28, 2020

Yes ! This should have been picked up in the rules.
I think I checked 'contributes' :(
gorule-0000001 is supposed to be checking this, see
http://release.geneontology.org/2020-03-25/reports/ecocyc-report.html#gorule-0000001

I think some common errors (case, underscore) should be fixed automatically.

@kltm can we make the qualifier its own separate rule ?

Thanks, Pascale

@kltm
Copy link
Member

kltm commented Mar 28, 2020

@pgaudet If it would be useful as a rare and separate catch--I think these recent developments are quite exceptional--adding a separate rule to be worked out with @dougli1sqrd would be fine. Generally, it would be simply an obviously illegal value that should fail the most basic parsing (0000001). This was nicely demonstrated here #252 (comment)

@suzialeksander
Copy link
Collaborator

Methinks the new releases have fixed this specific issue, thanks @murphyte for catching it. Closing, as this ticket seems to have spawned some new tickets and the specific issue is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants