Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Funnel Unordered Steps #4868

Merged
merged 44 commits into from
Jun 30, 2021
Merged

Funnel Unordered Steps #4868

merged 44 commits into from
Jun 30, 2021

Conversation

neilkakkar
Copy link
Contributor

Changes

Fixes #4848

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests
  • Migrations are safe to run at scale (e.g. PostHog Cloud) – present proof if not obvious
  • Frontend/CSS is usable at 320px (iPhone SE) and decent at 360px (most phones)
  • Breaking changes are backwards-compatible. Ensure old/new frontend requests work with new/old backends, and vice versa.

buwilliams and others added 28 commits June 18, 2021 08:48
…ed funnel_persons and funnel_trends_persons into individual classes;
…com:PostHog/posthog into funnel-persons-pagination-conversion-window
@timgl timgl temporarily deployed to posthog-pr-4868 June 24, 2021 14:49 Inactive
@timgl timgl temporarily deployed to posthog-pr-4868 June 25, 2021 14:03 Inactive
* dedup + tests

* deep equality. Tests to come

* write test for entity equality

* finish testing funnels

* clean up comments
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested the query generated here on production data too? Just to make sure it's relatively performant

@neilkakkar
Copy link
Contributor Author

Good idea, haven't done that for a big funnel, will do!

@timgl timgl temporarily deployed to posthog-pr-4868 June 25, 2021 15:30 Inactive
@neilkakkar
Copy link
Contributor Author

Will hold off on merging till I've done a speed test on prod. Will test the new optimization as well. Tests pass though, so I'm feeling confident about it

@neilkakkar
Copy link
Contributor Author

So, there doesn't seem to be any issues when dealing with O(500k) events / query window. But I'm getting out of memory exceptions (using > 9.3 Gigs of memory) when dealing with O(500M) events / query window.

The expanding if condition though seems to be resolved using arraySum

@neilkakkar neilkakkar mentioned this pull request Jun 28, 2021
7 tasks
Base automatically changed from funnel-step-query-new to master June 28, 2021 14:20
@EDsCODE EDsCODE temporarily deployed to posthog-pr-4868 June 28, 2021 23:20 Inactive
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the tests, it seems this does work for now. The growing unions based on step number concerns me. We will want to think up a solution that doesn't grow like that. Even though our main funnel query also grows with step count, it adds outer query calculations whereas this would add more and more subqueries -> more and more data to be filtered.

However, if this is the best for now, we'll put it behind a feature flag anyway for preliminary testing

) WHERE step_0 = 1"""

#  rotate entities by 1 to get new first event
entities_to_use.append(entities_to_use.pop(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly curious how this rotation covers cases where events are scrambled and out of order. Also, the subquery UNION here does pose a risk because if you have many events in an "unordered funnel" we could end up in trouble with performance because all the filters would be applied to the subqueries separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true.

So, the basic intuition is that we choose the first event, then choose any event that comes after it which matches our filters. So, just by switching the first event, we cover all cases where events may occur in any order.

It's not the rotation that's important, it's "what is the first event" that's important. Rotation felt like a good way to cover those bases.

Can you give an example of "scrambled and out of order" where you feel this doesn't work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the basic intuition is that we choose the first event, then choose any event that comes after it which matches our filters. So, just by switching the first event, we cover all cases where events may occur in any order.

Yeah this makes sense. From the code itself it just seems like it's still looking for ordering after the first event, but if you're just rotating the first event then it should cover

Copy link
Contributor Author

@neilkakkar neilkakkar Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it doesn't look for ordering, because I don't reuse the base funnel query (_get_steps_per_person_query), but just the inner event query

@neilkakkar
Copy link
Contributor Author

Agree, this isn't optimal, there ought to be a better way, but I haven't managed to figure it out yet.

@timgl timgl temporarily deployed to posthog-pr-4868 June 29, 2021 12:11 Inactive
@timgl timgl temporarily deployed to posthog-pr-4868 June 29, 2021 12:14 Inactive
@EDsCODE
Copy link
Member

EDsCODE commented Jun 29, 2021

Last thing that can be included in follow up is to add this class along the api path + a filter parameter that would account for the type of funnel ordering requested

ee/clickhouse/queries/funnels/base.py Outdated Show resolved Hide resolved
ee/clickhouse/queries/funnels/base.py Outdated Show resolved Hide resolved
ee/clickhouse/queries/funnels/funnel_event_query.py Outdated Show resolved Hide resolved
ee/clickhouse/queries/funnels/funnel_event_query.py Outdated Show resolved Hide resolved
ee/clickhouse/queries/funnels/funnel_event_query.py Outdated Show resolved Hide resolved
@timgl timgl temporarily deployed to posthog-pr-4868 June 30, 2021 09:28 Inactive
partition_select = self.get_partition_cols(1, max_steps)
sorting_condition = self.get_sorting_condition(max_steps)

for i in range(max_steps):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW what is our convention for numbering steps in queries? I'm not actually sure, but I saw that #4892 used 1-based indexing, while this is 0-based

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step_i is zero based everywhere, while level_index is one based everywhere :/ - idea is to get everything to zero based, soonish. I wrote a comment mentioning this on the first Funnel PR - #4851 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to merge if you're happy with it :) (comment if not lol)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now, I just misread #4892 a bit 👍

@Twixes Twixes merged commit 28d87ac into master Jun 30, 2021
@Twixes Twixes deleted the funnel-unordered-step branch June 30, 2021 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Funnel unordered steps
5 participants