Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubMed preferred prefix should be PMID #323

Open
1 task done
matentzn opened this issue Feb 24, 2022 · 14 comments
Open
1 task done

PubMed preferred prefix should be PMID #323

matentzn opened this issue Feb 24, 2022 · 14 comments
Labels

Comments

@matentzn
Copy link
Collaborator

matentzn commented Feb 24, 2022

As per conventions and pubmed itself, pubmed IDs should be prefixed with PMID:

Example page with a PMID: https://pubmed.ncbi.nlm.nih.gov/35189623/

Blocked By

@cthoyt
Copy link
Member

cthoyt commented Mar 2, 2022

There are several reservations about this requst:

  1. The preferred prefix can only be a capitalization variant of the canonical prefix, so we'd have to change both. This is also technically possible so this is more of a note about the ramifications of the reguest.
  2. The Bioregistry's "good prefix guidelines" encourage clear, transparent prefixes when possible, and explicitly discourages using redundant "ID" as part of a prefix. https://github.com/biopragmatics/bioregistry/blob/main/docs/CONTRIBUTING.md#choosing-a-good-prefix
  3. It doesn't appear that PubMed has a very principled approach to prefixes or writing CURIEs. I would not consider what's shown on a given article page as their recommendation. Two major issues:
    • They include a space in their "CURIE"
    • They write the "prefix" for PubMed Central Identifiers as PMCID, which does not make much sense either.
  4. The NCBI has a track record of making poor choices for its identifier recommendations. c.f. the NCBITaxon debacle - they explicitly recommend writing the CURIE for Homo sapiens as NCBI:txid9606. I think everyone agrees that there are issues with this, and that we don't always have to follow the recommendations.

Assessing the Community

I think it's very difficult to say if there actually is a consensus on what prefix should be used for PubMed, and if so, what that is. There's lots of different camps for PMID vs. pmid vs. pubmed (and even some who use MEDLINE).

We can take a look at the Bioregistry's page for PubMed to see which registries use which prefix. GO is actually the only external registry Bioregistry aligns on that uses PMID and not pubmed as the prefix. Identifiers.org, Prefix Commons, N2T, and others use pubmed. The Bioregistry primarily inherits from Identifiers.org, so this is why pubmed is the existing Bioregistry prefix.

Anyone who standardized on Identifiers.org will therefore be using pubmed as their prefix for PubMed. A few specific communities come to mind (I will update this list):

  • The systems biology community, that originally pushed MIRIAM then Identifiers.org as a way to standardize
  • Anyone in LinkML community that uses the "merged" context from the prefixmaps package

Further, anyone who has already started standardizing based on the Bioregistry will be using pubmed.

Aside
PubChem's RDF uses reference as the prefix for PubMed (ref: https://pubchem.ncbi.nlm.nih.gov/docs/rdf-uri). I haven't been able to find any competing NIH RDF resources with PubMed in it that aren't PubChem

Impact of Change

A big question remains: if we change something so widely used in the Bioregistry, then all of these people would have to update their data too.

Blockers

NCBI Invovement

The Bioregistry does not list a contact person for PubMed. I think it would be valuable to identify an individual from the NCBI who can participate in this discussion and authoritatively speak on the issue.

Alternate Solutions

For people who want to immediately use PMID as the default prefix (or in any other case where you have a disagreement with the Bioregistry's defaults), there are several different ways to generate a custom extended prefix map from the Bioregistry:

@cthoyt cthoyt changed the title Pubmed preferred prefix should be PMID PubMed preferred prefix should be PMID Mar 2, 2022
@matentzn
Copy link
Collaborator Author

matentzn commented Mar 2, 2022

Hmm.. Usually your arguments convince me more.. You are trying to pitch personal aesthetic preference against current practice and even turn against PubMed as a whole and their own choice of prefix. I am not swayed (yet). orcid also has the ID in it. I still think PMID should be canonical, but I am happy to change my mind of new arguments arise.

@sierra-moxon
Copy link
Contributor

I also would strongly favor adding PMID as the preferred prefix. It’s just the prefix that has been used in many applications forever and publicized on Pubmed itself.

@caufieldjh
Copy link
Contributor

People in Zhiyong Lu's group at NCBI should be qualified to comment: https://www.ncbi.nlm.nih.gov/research/bionlp/Team
Rob Leaman in particular.

@sierra-moxon
Copy link
Contributor

sierra-moxon commented Oct 18, 2023

@cthoyt - I just want to be clear that while I think we could advocate for PMID as the canonical prefix for Pubmed, I just really want at least the preferred prefix to be PMID (accepting that bioregistry will have strict naming conventions for prefix itself, but allowing the historical prefix to be used computationally in a generic way, without having another source of truth -- e.g. many bits of code everywhere to convert PMID -> pubmed or vice versa). Does adding this as a preferred prefix make sense?

@cthoyt
Copy link
Member

cthoyt commented Oct 20, 2023

I chatted in PM with @sierra-moxon and she agreed to take the lead in writing up a more structured set of arguments to support changing from pubmed to pmid. After that appears and someone can get the appropriate NCBI people actively involved in this discussion on GitHub (#966), we can give a few weeks for follow-up discussion, then the Bioregistry Review team (including @megbalk, @callahantiff, and @lubianat) can make a decision.

@cthoyt
Copy link
Member

cthoyt commented Oct 24, 2023

@rleaman can you help us identify a responsible individual for PubMed that can join our public discussions on GitHub about how to best reference PubMed identifiers?

@rleaman
Copy link

rleaman commented Oct 24, 2023

The person at NCBI who could most authoritatively comment on the preferred prefix / CURIE for PubMed would probably be in engineering. I'll figure out who that would be (I am in research) and follow up.

My opinion, for what it's worth: this seems like a case of a reasonable standard (e.g. "ID" shouldn't be part of the prefix) conflicting with a case ("PMID") that is probably both (1) better known than the standard and (2) predates the standard (e.g. https://pubmed.ncbi.nlm.nih.gov/15048644/). But I don't think that "pubmed" is unclear, and I don't have a good sense for how many people are using each one overall.

While the literature isn't the best use case for CURIEs, we can use it to try to get a sense of what's actually used: my best guess is that "pmid" is over 10x more popular than "pubmed." [Specifically: the number of times that "pmid" appears followed by a colon or a 7- or 8-digit number is 27,372. The number of times that "pubmed" appears followed by a colon or a 7- or 8-digit number is 2,037. Data is for case-insensitive bigram counts of PubMed and the PMC text mining subset, through early 2020.]

@jeffbeckncbi
Copy link

I've made some inquiries here at NCBI. For PubMed, we would prefer pubmed:123456 rather than using the more obscure "PMID". This is consistent with the pmc:PMC5678910 that we already discussed where the resource is the prefix.

The difference between pubmed and PMC (and most other ncbi databases) is that the PubMed ID is just an integer. They do not define the Accession ID structure like we have in PMC (PMC999999.9). So pubmed:45678910 would be the best option.

@matentzn
Copy link
Collaborator Author

matentzn commented Nov 8, 2023

@jeffbeckncbi thank you for inquiring. As this is an extremely consequential and costly decision I would really like to know who is "we" in "we would prefer" and what steps NCBI is taking to replace their own usage of PMID in all their websites and resources with pubmed:123.

Is their a concrete plan to depreciate use of PMID across the organisation?

@jeffbeckncbi
Copy link

@matentzn I answered this question about the prefix for a PubMed CURIE as a followup to my response about PMC CURIEs (#965)

There is no intention to change the label on the pubmed identifier on the pubmed site to use CURIEs, but if you are trying to write CURIEs for both the pubmed and pmc resources, identify the resource in the prefix and don't just use the abbreviation for pubmed id.

I am the Program Head for Literature at NCBI - the group that runs PubMed and PMC at the US Library of Medicine. And I consulted on the question of CURIE prefix for these resources with NCBI leadership

@matentzn
Copy link
Collaborator Author

matentzn commented Nov 8, 2023

Thank you for the clarification, I didn't see that discussion - followed up now. I will come back to you soon!

@cthoyt
Copy link
Member

cthoyt commented Nov 13, 2023

@jeffbeckncbi thank you, having an authoritative voice on this is incredibly valuable.

@sierra-moxon It's still the Bioregistry Review Team policy to weigh all arguments, even those contrary to the Identifier Space Owner (ISO). If you are still willing to write up a more detailed argument (I mentioned in #323 (comment) that you had already agreed to do this), then the Bioregistry Review Team can consider this. If you're still interested in doing that, do you think you could do it by the end of this week?

@sierra-moxon
Copy link
Contributor

sierra-moxon commented Nov 14, 2023

I think @rleaman's simple search for the prefix in the corpus of publications before 2020 in this thread paints a good picture of the usage and I imagine others on this thread to be better than I at justifying. To clarify again, my ask on this ticket was to simply add a Bioregistry preferred annotation to PMID (or otherwise distinguish PMID from the other namespace/prefix synonyms).

Here are several more resources (besides the Gene Ontology) that use PMID as a namespace in pubmed identifiers:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants