Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGI xrefs failing GO checks #408

Open
kltm opened this issue Nov 22, 2024 · 6 comments
Open

MGI xrefs failing GO checks #408

kltm opened this issue Nov 22, 2024 · 6 comments

Comments

@kltm
Copy link
Member

kltm commented Nov 22, 2024

From @ValWood at pombase/pombase-chado#1224

Our MGI ISO xrefs are failing checks.

WARNING - Invalid identifier:GORULE:0000027: 1298204 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC530.12c pdf1 enables GO:0008474 PMID:15075260 ISO MGI:MGI:1298204 F palmitoyl protein thioesterase/ dolichol pyrophosphate phosphatase fusion protein Pdf1 protein taxon:4896 20040414 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1316717 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC20F10.03 SPBC20F10.03 is_active_in GO:0005634 GO_REF:0000024 ISS MGI:MGI:1316717 C armadillo-type fold protein, human IFRD1 ortholog, implicated in transcription or signaling protein taxon:4896 20170830 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1346084 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC6C3.09 rpp40 part_of GO:0005655 GO_REF:0000024 ISS MGI:MGI:1346084 C RNase P and RNase MRP subunit Rpp40 protein taxon:4896 20061017 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 involved_in GO:0042843 GO_REF:0000024 ISS MGI:MGI:1919005 P D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 enables GO:0047837 GO_REF:0000024 ISS MGI:MGI:1919005 F D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase 

but its a bit weird because the display and the URL are MGI:1298204 but on the pop-up it says MGI:1298204
could you have a dig and see if the syntax has been resolved to remove the first MGI: or something?

@kltm 's response:

Looking at https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000027.md . Okay, "soft" warning, so no data filtering.

The moment of failure is likely here:
https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L835
Special casing for MGI leading into it is:
https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L802-L806

So, it looks like MGI:MGI:1919005 would be clipped to MGI and 1919005, the latter of which would fail when checking against the regexp. The options here would be:

  • change the dbxrefs regexp to reflect our behind-the-scenes fix of MGI (I'm not sure what the knock-on effect would be)
  • remove the ontobio "fix" (I'm not sure what the knock-on effect would be)
  • change the MGI full id to MGI:MGI:MGI:1919005 (I know what the knock-on effect would be: hilarity)

Either way, @pgaudet , this is probably best approached as a GO QC bug for the moment (although a "light" one as no fix or filtering is done) and added to the QC worklist.

@kltm
Copy link
Member Author

kltm commented Nov 22, 2024

@pgaudet I've temporarily put this in the "low-hanging fruit" project in the spec and prioritize section.

@mugitty
Copy link
Contributor

mugitty commented Nov 25, 2024

From db.xrefs file:

- database: MGI
  name: Mouse Genome Informatics
  rdf_uri_prefix: http://identifiers.org/MGI/
  generic_urls:
    - http://www.informatics.jax.org/
  entity_types:
    - type_name: gene
      type_id: SO:0000704
      id_syntax: MGI:[0-9]{5,}
      url_syntax: http://www.informatics.jax.org/accession/[example_id]
      example_id: MGI:MGI:1345277
      example_url: http://www.informatics.jax.org/accession/MGI:1345277
    - type_name: variation
      type_id: VariO:0001
      id_syntax: MGI:[0-9]{5,}
      url_syntax: http://www.informatics.jax.org/accession/[example_id]
      example_id: MGI:MGI:3590672
      example_url: http://www.informatics.jax.org/accession/MGI:3590672

The code validates against the id_syntax field. A new entry has to be added with
id_syntax: MGI:MGI:[0-9]{5,}

@kltm
Copy link
Member Author

kltm commented Nov 27, 2024

@mugitty Hm. I think that entry is actually correct. If you look at something like SGD or WB, you can see that they match in intention, even though MGI is a special case. (The field names id_syntax and example_id are very poor; they should be: internal_id_syntax and example_curie or something...)

@mugitty
Copy link
Contributor

mugitty commented Nov 27, 2024

@kltm, the code is using id_syntax, which is specified in https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.schema.yaml. If internal_id_syntax is to be used then it has to be added to db-xrefs.schema.yaml. Currently, id_syntax does not match example_id. The code handles multiple id_syntax entries. It does not use example_id

@kltm
Copy link
Member Author

kltm commented Nov 27, 2024

@mugitty Yes, that is correct: id_syntax never matches example_id, as they are different concepts.
Take a look at another example, like SGD: https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml#L2386-L2399 . id_syntax never matches example_id, as they are kinda misnamed for historical reasons.

id_syntax: the regexp for the database's internal id
example_id: an example CURIE (namespace plus internal id) of the resource

If the metadata is correct, we need to look at affecting the change we want--with the metadata we have--in the code. The MGI:MGI doubling has always caused problems...

@mugitty mugitty self-assigned this Dec 5, 2024
@mugitty
Copy link
Contributor

mugitty commented Dec 5, 2024

@pgaudet, the internal representation of MGI was updated due to geneontology/go-site#91

I will update to handle what is in db-xrefs.yaml as well as the internal representation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To spec out & prioritize
Development

No branches or pull requests

2 participants