-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ploidy for Foxtrot VDS [VS-1418] #9082
Conversation
mcovarr
commented
Jan 21, 2025
•
edited
Loading
edited
# hg38 = hl.get_reference("GRCh38") | ||
# xy_contigs = set(hg38.x_contigs + hg38.y_contigs) | ||
# ploidy_table = { | ||
# contig: ploidy_table[key] | ||
# for contig, key in zip(hg38.contigs, sorted(ploidy_table)) | ||
# if contig in xy_contigs | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the lines from the original PR that were giving me trouble, in particular the zip
when I was supplying Avro data with more than just the X and Y contigs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand why Chris did this? Should we check in with him about it's removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to George's comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just wondering if we should check with Chris V about that removed code snippet.
# hg38 = hl.get_reference("GRCh38") | ||
# xy_contigs = set(hg38.x_contigs + hg38.y_contigs) | ||
# ploidy_table = { | ||
# contig: ploidy_table[key] | ||
# for contig, key in zip(hg38.contigs, sorted(ploidy_table)) | ||
# if contig in xy_contigs | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand why Chris did this? Should we check in with him about it's removal?
@@ -64,7 +65,9 @@ def run_in_cluster(cluster_name, account, worker_machine_type, master_machine_ty | |||
) | |||
|
|||
# prepare custom arguments | |||
secondary_script_path_arg = f'--py-files {" ".join(secondary_script_path_list)}' if secondary_script_path_list else '' | |||
# the following says `--py-files` is supposed to be a comma separated list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this work before? Did we never pass multiple py-files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooooh it probably didn't work before--we probably only ever gave it a single secondary script at a time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly, I was the lucky first person to supply more than one
@@ -1,6 +1,6 @@ | |||
version 1.0 | |||
|
|||
import "GvsUtils.wdl" as Utils | |||
import "../GvsUtils.wdl" as Utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops
FROM \`~{project_id}.~{dataset_name}.~{ploidy_table_name}\` p | ||
JOIN \`~{project_id}.~{dataset_name}.sample_info\` s ON p.sample_id = s.sample_id | ||
WHERE (p.chromosome / 1000000000000 = 23 or p.chromosome / 1000000000000 = 24) | ||
" --call_set_identifier ~{call_set_identifier} --dataset_name ~{dataset_name} --table_name ~{ploidy_table_name} --project_id=~{project_id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we make a change where we only got the avro files for one BQ partition/one vet_x table/group of 4k samples at a time? We did that to make Hail faster when we passed in the vet and ref data. Do we not need to do this with ploidy data because it is added to the VDS at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That and ploidy data is tiny: two rows per sample.