Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify and BGBM: Question on identifications and organism #61

Closed
MortenHofft opened this issue Feb 14, 2023 · 13 comments
Closed

Specify and BGBM: Question on identifications and organism #61

MortenHofft opened this issue Feb 14, 2023 · 13 comments

Comments

@MortenHofft
Copy link
Member

MortenHofft commented Feb 14, 2023

@tucotuco this might be a question for you and not @acbentley

Context: I'm trying to reconstruct a single specimen https://www.gbif.org/occurrence/657029235
which I believe is the same as dc96c974-1ed3-11e3-bfac-90b11c41863e in the data we have in this repo. And I think this is the specify version: https://ichthyology.specify.ku.edu/specify/view/collectionobject/34592/

For BGBM and koldingensis i used the identification_evidence table to get a list of identifications for an entity.

That table is as you know empty in the Specify case. Then what to do?

@acbentley
Copy link
Collaborator

That record should have a single identification in the identification table related to the collection object through an organism_id which is in turn related to a taxonomic concept in the taxon table through the taxon_identification table. I do have multiple determinations for some records but have not included these (as yet).

@tucotuco
Copy link
Collaborator

In Andy's data there are so far no MaterialEntities that are not collection objects (Organisms). This will change with tissue samples. Even so, as I understand it the Identifications in Andy's data are not explicitly based on the GeneticSequences anyway. Thus, there are not and will not be IdentificationEvidence records.
The normal Identification path is as Andy described. The collection object (Organism) has an accepted_identification_id at a minimum that connects to one Identification record (for multiple identifications of an Organism the identification.organism_id foreign key can also be used). The Identification links to the TaxonIdentification where the individual taxa in the taxonFormula are unraveled and linked to real taxa in the Taxon table.

@MortenHofft
Copy link
Member Author

MortenHofft commented Feb 14, 2023

Thanks @tucotuco

What is the proper way to get the organism then from a material entity? Through a entity_relationship with a - yet to be agreed - type?

@tucotuco
Copy link
Collaborator

@MortenHofft I'm not sure what you mean by "get the organism then from a material entity". An Organism IS a MaterialEntity. The Organism might have parts (skulls, seeds, tissue samples, etc) that are MaterialEntities derived from the Organism. In the other direction an Organism might be part of a fossil, which is a MaterialEntity with potential many Organisms represented in it. The Organism type hierarchy means that every Organism records has to have a MaterialEntity record and an Entity record. The relationships between an Organism and any other Entity need to happen in the EntityRelationships table.

@MortenHofft
Copy link
Member Author

MortenHofft commented Feb 15, 2023

Thanks John. As my starting point has been the canonical Koldingensis example this was lost on me. Lets take some of the examples records that I have to work with then:

Koldingensis
https://github.com/gbif/model-material/blob/koldingensis/koldingensis/koldingensis_db.txt
The fruit body of the full organism. So I suppose that it makes sense that in that case you would have to go through the relation table?

// material entity table
"materialEntityType": "PRESERVEDSPECIMEN",
"materialEntityId": "5c488c08-8cab-444a-9598-806dd0abec85",

// Not corresponding organism for that ID

//entity relationships
"entityRelationshipType": "MATERIAL SAMPLE OF",
object > entity >materialEntity with "materialEntityType": "ORGANISM",

BGBM
Example record ID: B 10 1171483 http://herbarium.bgbm.org/object/B101171483

// material entity table
"materialEntityType": "PreservedSpecimen", <== should it be "ORGANISM" then?
"materialEntityId": "B 10 1171483",

// organism table
"organismId": "B 10 1171483", <== Or should it not have an organism entry since it is a branch and not the whole tree? // corrected as I had mistyped as 1167678

The specimen is a branch from a tree.

Specify
Since Andy typically deals with whole specimens (the partial ones should be modelled differently then I suppose), then organism and specimen shares an ID.

// material entity table
"materialEntityType": "Organism", <== should be ORGANISM then?
"materialEntityId": "db46a51c-1ed3-11e3-bfac-90b11c41863e",

// organism table
"organismId": "db46a51c-1ed3-11e3-bfac-90b11c41863e",
"organismScope": "lot", <== that it is a lot doesn't matter since organisms can be e.g. a school of fish?

So am I then right in assuming that if I have a specimen ID, the way for me to get at the organism is to:

  • check the organism table for an entry with the same ID
  • If that fails, then go to the entity_relationship table and look for some (hopefully in the future a fixed set of possible) relationship type (e.g. MATERIAL SAMPLE OF as in the Koldingensis example), and from that get the organism?

@tucotuco
Copy link
Collaborator

A lot to unravel here....

Thanks John. As my starting point has been the canonical Koldingensis example this was lost on me. Lets take some of the examples records that I have to work with then:

Koldingensis https://github.com/gbif/model-material/blob/koldingensis/koldingensis/koldingensis_db.txt The fruit body of the full organism. So I suppose that it makes sense that in that case you would have to go through the relation table?

// material entity table
"materialEntityType": "PRESERVEDSPECIMEN",
"materialEntityId": "5c488c08-8cab-444a-9598-806dd0abec85",

// Not corresponding organism for that ID

//entity relationships
"entityRelationshipType": "MATERIAL SAMPLE OF",
object > entity >materialEntity with "materialEntityType": "ORGANISM",

There SHOULD be an Organism record and an entity relationship "5c488c08-8cab-444a-9598-806dd0abec85" "MATERIAL SAMPLE OF" [that Organism record's organism_id].

BGBM Example record ID: B 10 1171483 http://herbarium.bgbm.org/object/B101171483

// material entity table
"materialEntityType": "PreservedSpecimen", <== should it be "ORGANISM" then?
"materialEntityId": "B 10 1171483",

// organism table
"organismId": "B 10 1167678", <== Or should it not have an organism entry since it is a branch and not the whole tree?

The specimen is a branch from a tree.

There SHOULD be an Organism record and an entity relationship "B 10 1171483" "MATERIAL SAMPLE OF" "B 10 1167678".

Specify Since Andy typically deals with whole specimens (the partial ones should be modelled differently then I suppose), then organism and specimen shares an ID.

// material entity table
"materialEntityType": "Organism", <== should be ORGANISM then?
"materialEntityId": "db46a51c-1ed3-11e3-bfac-90b11c41863e",

// organism table
"organismId": "db46a51c-1ed3-11e3-bfac-90b11c41863e",
"organismScope": "lot", <== that it is a lot doesn't matter since organisms can be e.g. a school of fish?

In Andy's case we agreed that the collection object ids correspond most closely with an organism_id, so there are Organism records without any other MaterialEntities. When the tissues get incorporated that will change. Andy will still have these collection_object_ids from the specimen collection for the organism_ids, but there will be additional MaterialEntities for the tissue samples and the EntityRelationships to show that they are "MATERIAL SAMPLE OF" the Organisms.

So am I then right in assuming that if I have a specimen ID, the way for me to get at the organism is to:

  • check the organism table for an entry with the same ID
  • If that fails, then go to the entity_relationship table and look for some (hopefully in the future a fixed set of possible) relationship type (e.g. MATERIAL SAMPLE OF as in the Koldingensis example), and from that get the organism?

I think there may be two issues here. In the CMSs case, the participants need to provide the organism.accepted_identification_id and the corresponding records in the identification table.

In the Koldingensis case, the same thing has to happen but hasn't. Record for the Organisms will have to be built.

@MortenHofft
Copy link
Member Author

MortenHofft commented Feb 15, 2023

Sorry - just a small correction to above BGBM example, i seem to have mistyped. There already is an organismId with id B 10 1171483 - so they are like Andys in that respect. - I've updated the example above

@tucotuco
Copy link
Collaborator

OK, from what is showing in the original example now...

BGBM
Example record ID: B 10 1171483 http://herbarium.bgbm.org/object/B101171483

// material entity table
"materialEntityType": "PreservedSpecimen", <== should it be "ORGANISM" then?
"materialEntityId": "B 10 1171483",

// organism table
"organismId": "B 10 1171483", <== Or should it not have an organism entry since it is a branch and not the whole tree? // corrected as I had mistyped as 1167678

...the material_entity table entry is the supertype MaterialEntity record for the Organism (because they have the same identifier). In that case, yes, the materialEntityType MUST be "ORGANISM".

@MortenHofft
Copy link
Member Author

So despite being a branch from a tree, it should still be of type ORGANISM?

@timrobertson100
Copy link
Member

I think @tucotuco is trying to say there should be 2 entities that represent the tree and the branch:

Entity

entityID entityType
B 10 1171483 MATERIAL_ENTITY
Tree-1 MATERIAL_ENTITY

Material Entity

materialEntityID materialEntityType
B 10 1171483 HERBARIUM_SHEET
Tree-1 ORGANISM

Organism

organismID organismScope
Tree-1 ORGANISM

There would additionally be a relationship declared capturing that B 10 1171483 is a sample of Tree-1.

@timrobertson100 timrobertson100 changed the title specify: question on identifications Specify and BGBM: Question on identifications and organism Feb 17, 2023
@tucotuco
Copy link
Collaborator

@timrobertson100 has the ideal model right.

They have two options. One option is to have each collection object record represent a record for an Organism. This is what BGBM and Specify have done. In this scenario they describe what parts they have, but do not create separate instances of MaterialEntities for them.

The other option is to have the collection object records represent material parts of Organisms. This is what Arctos does. In this scenario there would have to be distinct Organism records with identifiers and related to the collection objects. Arctos has these. BGBM and Specify would have to create them. They shouldn't have to.

@acbentley
Copy link
Collaborator

Not that I want to complicate this any further for myself but could this also be used to tease out preparations for the same object that have the same catalog number? This is common in fishes where you have a lot - KUI xxxxx - that may have some ethanol specimens, some skeletal specimens, some cleared and stained specimens, and (potentially) some tissues. Currently, I am reporting these as a formatted string in the preparation field.

@tucotuco
Copy link
Collaborator

@acbentley Yes, that is one example of the kinds of thing the MaterialEntity relationships are meant to capture. It would entail parsing and minting identifier for the "parts" and then making relationships between those parts and the lot they are from. Except for the tissues, I'm not sure you have much to gain by teasing those out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants