Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DC-3729] Generate derived tables #1854

Merged
merged 15 commits into from
Mar 29, 2024
Merged

[DC-3729] Generate derived tables #1854

merged 15 commits into from
Mar 29, 2024

Conversation

ratuagga
Copy link
Contributor

generate derived table on deid step cleaning rule.

Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Needs clean_cdr.py as well

Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor suggestions

data_steward/cdr_cleaner/clean_cdr.py Outdated Show resolved Hide resolved
data_steward/cdr_cleaner/clean_cdr.py Outdated Show resolved Hide resolved
Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update id


POPULATE_OBS_PRD = JINJA_ENV.from_string("""
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}`
SELECT person_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT person_id,
SELECT ROW_NUMBER() OVER (ORDER BY person_id) AS observation_period_id,
person_id,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when you add this

Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more suggestions

"""
:return: a list of SQL strings to run
"""
create_sandbox_table_list = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for clarity

Suggested change
create_sandbox_table_list = []
create_derived_table_list = []


POPULATE_OBS_PRD = JINJA_ENV.from_string("""
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}`
SELECT person_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when you add this

""")

POPULATE_OBS_PRD = JINJA_ENV.from_string("""
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cols for all 3 queries

Suggested change
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}`
INSERT INTO `{{project_id}}.{{dataset_id}}.{{storage_table_name}}` {{cols}}

dataset_id=self.dataset_id,
sandbox_dataset_id=self.sandbox_dataset_id,
ehr_cutoff_date=self.ehr_cutoff_date,
storage_table_name=OBSERVATION_PERIOD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cols, and fields can be retrieved from the schema json

Suggested change
storage_table_name=OBSERVATION_PERIOD)
storage_table_name=OBSERVATION_PERIOD,
cols=f"({','.join(fields_list)})")

Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another suggestion

FROM `{{project_id}}.{{dataset_id}}.person` AS pt
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence` AS vt ON pt.person_id = vt.person_id
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence_ext` AS e ON vt.visit_occurrence_id = e.visit_occurrence_id
WHERE src_id LIKE '%EHR%'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this EHR restrictions from all the tables.

Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added last change before merge

FROM `{{project_id}}.{{dataset_id}}.person` AS pt
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence` AS vt ON pt.person_id = vt.person_id
JOIN `{{project_id}}.{{dataset_id}}.visit_occurrence_ext` AS e ON vt.visit_occurrence_id = e.visit_occurrence_id
AND EXTRACT(YEAR FROM vt.visit_start_date) >= 1985
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be updated to "WHERE"

Suggested change
AND EXTRACT(YEAR FROM vt.visit_start_date) >= 1985
WHERE EXTRACT(YEAR FROM vt.visit_start_date) >= 1985

Copy link
Contributor

@nishanthpp93 nishanthpp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ratuagga ratuagga merged commit fb25920 into develop Mar 29, 2024
4 checks passed
@ratuagga ratuagga deleted the ra/dc3729 branch March 29, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants