You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The intention here is to provide a more flexible, composable alternative to the existing count_episodes_for_patient aggregation.
The idea is that given a date event series like this:
date
2020-01-01
2020-01-04
2020-01-06
2020-01-10
2020-01-12
We could "cluster" it into episodes, defining the maximum gap allowed between episodes like so:
date.cluster_by_episode(days(3))
And this would produce a new event frame with two columns giving the start and end dates of each episode:
start_date
end_date
2020-01-01
2020-01-06
2020-01-10
2020-01-12
Calling count_for_patient() on this frame would give the same result as the existing count_episodes_for_patient() aggregation. But all sorts of other operation are now possible e.g. calculating total episode length and including only those over a certain length:
This is a new kind of operation in the sense that it takes a series and returns a frame. It's also unusual in that the frame class it returns is not one already defined in a schema module. In the query_language module we'll probably want to define a dedicated frame class for this with start_date and end_date series.
Note that this operation is not an aggregation (it doesn't result in a one-row-per-patient result) so we shouldn't use the _for_patient() suffix in naming it.
I don't have a clear sense of how fiddly or not this will be to implement in SQL. The existing episode counting code uses a LAG window function to do it's work. It may be that a simple adaption of this code will do what we need. Or there may turn out to be some significant blocker I haven't thought of.
The text was updated successfully, but these errors were encountered:
The intention here is to provide a more flexible, composable alternative to the existing
count_episodes_for_patient
aggregation.The idea is that given a date event series like this:
date
We could "cluster" it into episodes, defining the maximum gap allowed between episodes like so:
And this would produce a new event frame with two columns giving the start and end dates of each episode:
start_date
end_date
Calling
count_for_patient()
on this frame would give the same result as the existingcount_episodes_for_patient()
aggregation. But all sorts of other operation are now possible e.g. calculating total episode length and including only those over a certain length:Original Slack discussion with motivating use cases:
https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1729785637716699
Implementation notes
This is a new kind of operation in the sense that it takes a series and returns a frame. It's also unusual in that the frame class it returns is not one already defined in a schema module. In the
query_language
module we'll probably want to define a dedicated frame class for this withstart_date
andend_date
series.Note that this operation is not an aggregation (it doesn't result in a one-row-per-patient result) so we shouldn't use the
_for_patient()
suffix in naming it.I don't have a clear sense of how fiddly or not this will be to implement in SQL. The existing episode counting code uses a
LAG
window function to do it's work. It may be that a simple adaption of this code will do what we need. Or there may turn out to be some significant blocker I haven't thought of.The text was updated successfully, but these errors were encountered: