-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funnel Unordered Steps #4868
Funnel Unordered Steps #4868
Conversation
…t for conversion window;
… cleanup via destroy() method
…ed funnel_persons and funnel_trends_persons into individual classes;
…com:PostHog/posthog into funnel-persons-pagination-conversion-window
* dedup + tests * deep equality. Tests to come * write test for entity equality * finish testing funnels * clean up comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the query generated here on production data too? Just to make sure it's relatively performant
Good idea, haven't done that for a big funnel, will do! |
…to funnel-unordered-step
Will hold off on merging till I've done a speed test on prod. Will test the new optimization as well. Tests pass though, so I'm feeling confident about it |
So, there doesn't seem to be any issues when dealing with O(500k) events / query window. But I'm getting out of memory exceptions (using > 9.3 Gigs of memory) when dealing with O(500M) events / query window. The expanding if condition though seems to be resolved using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the tests, it seems this does work for now. The growing unions based on step number concerns me. We will want to think up a solution that doesn't grow like that. Even though our main funnel query also grows with step count, it adds outer query calculations whereas this would add more and more subqueries -> more and more data to be filtered.
However, if this is the best for now, we'll put it behind a feature flag anyway for preliminary testing
) WHERE step_0 = 1""" | ||
|
||
# rotate entities by 1 to get new first event | ||
entities_to_use.append(entities_to_use.pop(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly curious how this rotation covers cases where events are scrambled and out of order. Also, the subquery UNION here does pose a risk because if you have many events in an "unordered funnel" we could end up in trouble with performance because all the filters would be applied to the subqueries separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very true.
So, the basic intuition is that we choose the first event, then choose any event that comes after it which matches our filters. So, just by switching the first event, we cover all cases where events may occur in any order.
It's not the rotation that's important, it's "what is the first event" that's important. Rotation felt like a good way to cover those bases.
Can you give an example of "scrambled and out of order" where you feel this doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the basic intuition is that we choose the first event, then choose any event that comes after it which matches our filters. So, just by switching the first event, we cover all cases where events may occur in any order.
Yeah this makes sense. From the code itself it just seems like it's still looking for ordering after the first event, but if you're just rotating the first event then it should cover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it doesn't look for ordering, because I don't reuse the base funnel query (_get_steps_per_person_query
), but just the inner event query
Agree, this isn't optimal, there ought to be a better way, but I haven't managed to figure it out yet. |
Last thing that can be included in follow up is to add this class along the api path + a filter parameter that would account for the type of funnel ordering requested |
partition_select = self.get_partition_cols(1, max_steps) | ||
sorting_condition = self.get_sorting_condition(max_steps) | ||
|
||
for i in range(max_steps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW what is our convention for numbering steps in queries? I'm not actually sure, but I saw that #4892 used 1-based indexing, while this is 0-based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step_i is zero based everywhere, while level_index is one based everywhere :/ - idea is to get everything to zero based, soonish. I wrote a comment mentioning this on the first Funnel PR - #4851 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to merge if you're happy with it :) (comment if not lol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now, I just misread #4892 a bit 👍
Changes
Fixes #4848
Checklist