-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DC-3729] Generate derived tables #1854
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Needs clean_cdr.py as well
data_steward/cdr_cleaner/cleaning_rules/generate_derived_tables.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more minor suggestions
data_steward/cdr_cleaner/cleaning_rules/generate_derived_tables.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update id
|
||
POPULATE_OBS_PRD = JINJA_ENV.from_string(""" | ||
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}` | ||
SELECT person_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT person_id, | |
SELECT ROW_NUMBER() OVER (ORDER BY person_id) AS observation_period_id, | |
person_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when you add this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more suggestions
""" | ||
:return: a list of SQL strings to run | ||
""" | ||
create_sandbox_table_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename for clarity
create_sandbox_table_list = [] | |
create_derived_table_list = [] |
|
||
POPULATE_OBS_PRD = JINJA_ENV.from_string(""" | ||
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}` | ||
SELECT person_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when you add this
""") | ||
|
||
POPULATE_OBS_PRD = JINJA_ENV.from_string(""" | ||
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cols for all 3 queries
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}` | |
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}` {{cols}} |
dataset_id=self.dataset_id, | ||
sandbox_dataset_id=self.sandbox_dataset_id, | ||
ehr_cutoff_date=self.ehr_cutoff_date, | ||
storage_table_name=OBSERVATION_PERIOD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cols, and fields can be retrieved from the schema json
storage_table_name=OBSERVATION_PERIOD) | |
storage_table_name=OBSERVATION_PERIOD, | |
cols=f"({','.join(fields_list)})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another suggestion
FROM `{{project_id}}.{{dataset_id}}.person` AS pt | ||
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence` AS vt ON pt.person_id = vt.person_id | ||
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence_ext` AS e ON vt.visit_occurrence_id = e.visit_occurrence_id | ||
WHERE src_id LIKE '%EHR%' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this EHR restrictions from all the tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added last change before merge
FROM `{{project_id}}.{{dataset_id}}.person` AS pt | ||
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence` AS vt ON pt.person_id = vt.person_id | ||
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence_ext` AS e ON vt.visit_occurrence_id = e.visit_occurrence_id | ||
AND EXTRACT(YEAR FROM vt.visit_start_date) >= 1985 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be updated to "WHERE"
AND EXTRACT(YEAR FROM vt.visit_start_date) >= 1985 | |
WHERE EXTRACT(YEAR FROM vt.visit_start_date) >= 1985 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
generate derived table on deid step cleaning rule.