-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: cohort extraction on latest GWAS Catalog release files #521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RobinM-code, Thanks for your PR! Yes, indeed the GWAS Catalog recently made some changes in data export, which made some changes on our side necessary. Based on communication with the GWAS Catalog team, this change is planned and permanent. (We actually started to follow their announcement mailing list to keep up with any planned changes for this reasons)
Although I addressed this issue on my branch (except the update of the test data), I'll approve and merge this PR, because the PR you were referencing tries to deal with other problems I'm not fully ready with.
Thank you for your contribution, please keep looking into the code, try to run things and please feel free to contact us if you have question or raise issue if you find something off. We highly value any input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately a pre-commit update change needs to be reverted
.pre-commit-config.yaml
Outdated
@@ -56,7 +56,7 @@ repos: | |||
exclude: "CHANGELOG.md" | |||
|
|||
- repo: https://github.com/alessandrojcm/commitlint-pre-commit-hook | |||
rev: v9.11.0 | |||
rev: v9.13.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following a discussion with @d0choa, this update needs to be reverted: as seen on the pre-commit forums and as experienced by us this update causes pre-commits fail.
On an other note, we were wondering if the pre-commit checks were run on your machine? There can be a divergence between the actual local environment and the remote dev branch leading to complications. So a regular make setup-dev
is advised to make sure the local environment is always up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DSuveges, I was already afraid this would cause some trouble. I am happy to revert, but as mentioned, when running make setup-dev
this file is automatically updated. If I revert the updates pre-commit fails and I am unable to do any commits...
Trying to revert the specific commit:
Running make setup-dev
: (notice the updating of ruff and commitlint)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR merged today removes the poblematic row from the makefile. If you update your branch that should sort out these issues hopefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gwas catalog v1.0.3.1 moved cohort from ancestry to study files. This breaks the `annotate_ancestries` method in the `StudyIndexGWASCatalog` class. Use the gwas catalog study files to get the cohorts instead.
20a4022
to
3509f0a
Compare
Closing this PR, as the original problem due to the column name changes in the GWAS Catalog datasets were already addressed. |
✨ Context
I tried to run
gwas_catalog_ingestion
from the gentropy package. It seems that this step does not work with the current GWAS Catalog files. (I downloaded thecatalog_study_files
,catalog_ancestry_files
andcatalog_associations_file
from here: https://ftp.ebi.ac.uk/pub/databases/gwas/releases/2024/03/04/)To be more specific, the
COHORT(S)
header was removed from thecatalog_ancestry_files
. However, it was placed back in thecatalog_study_files
asCOHORT
. See documentation from GWAS Catalog: https://www.ebi.ac.uk/gwas/docs/fileheaders#_file_headers_for_unpublished_ancestriesI believe @DSuveges has found this as well. See his 3rd task in this PR: #507
🛠 What does this PR implement
This PR makes sure the COHORT(S) are extracted from the
catalog_study_files
.Sample files and pytests have been updated.
Note that I had to update pre-commit-config.yaml as well. (It updated while running
make setup-dev
. Not including the files made pre-commit run into errors)🙈 Missing
I am not yet aware of the full gentropy package. Could there be any other steps that use the
COHORT(S)
header from thecatalog_ancestry_files
?🚦 Before submitting
dev
branch?make test
)?poetry run pre-commit run --all-files
)?