Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes to sample QC notebook #773

Merged
merged 2 commits into from
Jan 31, 2025
Merged

Small fixes to sample QC notebook #773

merged 2 commits into from
Jan 31, 2025

Conversation

epiercehoffman
Copy link
Collaborator

Updates include:

  • Small documentation clarifications
  • Display sample_set_tbl even if not reloading or editing included batches
  • Save new sample_set_tbl in the same cell as editing batches
  • Don't comment cells to be run once, since they should be run the first time
  • Only put "(Filtered)" in plot title during filtering
  • Print number of differences between PED files
  • Collapsed a few more headings by default

Testing:
Tested in clone of public workspace on reference panel data, with and without reference PED file. Cleared outputs and tested re-upload

"sample_set_tbl = sample_set_tbl[sample_set_tbl['entity:sample_set_id'].str.contains('KJ_EvidenceQC_Updates')]"
"sample_set_tbl = sample_set_tbl[sample_set_tbl['entity:sample_set_id'].str.contains('KJ_EvidenceQC_Updates')]\n",
"file_path = generate_file_path(TLD_PATH, 'artifacts', 'sample_sets_qc.tsv')\n",
"save_df(WS_BUCKET, file_path, sample_set_tbl)"
Copy link
Collaborator

@kjaisingh kjaisingh Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we comment out the reference to KJ_EvidenceQC_Updates? I see it's used as an example, but probably should be commented out. And maybe even change the substring being searched for to something more generalizable rather than 'KJ'.

This must've been missed in the first pass.

Copy link
Collaborator

@kjaisingh kjaisingh Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add 1-2 new lines between the comment to optionally filter to only include a subset of batches. and the two lines that now follow it (generate_file_path and save_df)?

I worry that folks will simply ignore a single-line comment. Alternatively, we can add a green box for (optional) user input to make this more clear, but believe we decided against this in previous iterations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a green box and a new variable SUBSTRING and only run the following lines if SUBSTRING is not None?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Collaborator

@kjaisingh kjaisingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for these cleanups - just one minor comment about the formatting relating to subsetting the batches to process.

@kjaisingh
Copy link
Collaborator

Looks great now, thanks!

@epiercehoffman epiercehoffman merged commit d9a905e into main Jan 31, 2025
5 checks passed
@epiercehoffman epiercehoffman deleted the eph_notebook_fixes branch January 31, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants