You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Say we're in a funnel and want to find all the paths users take between two steps. So, we have a list of personIDs, and corresponding event timestamps and eventIDs for start and finish.
There are also some constraints:
(1) If we can help it, we shouldn't use window functions (non-distributed)
(2) If we can help it, keep things simple.
The problem: Given a personID and an eventUUID does uniquely identify a point on the user's path, but it isn't very helpful in figuring out which way to go on the path. That's where the timestamp comes in. Not only does it uniquely identify a start and end, it also is sufficient to figure out the direction to move in (from smaller timestamp to larger timestamp).
Thus, the only role (I think) the event UUID plays is to choose a start/end point, assuming there were multiple events happening at the same timestamp.
Thus, I want to make event UUID optional (it's complex to get this out of funnels, and paths only use it for autocapture elements, nothing else).
This allows us to (1) write easy queries (you literally just constrain the timestamps to get the complete path). (1a) Once you have a path array, filter with the UUID to get the right start and end points, if needed.
So, for all Paths related querying over persons, all we require are personIDs and timestamps. And optionally, for better granularity, an event UUID.
I agree and would go further: events at the same timestamp are an edge case we don't need to worry about at all. Hence I'd advocate for using timestamps only to KISS.
Another reason for that is we need to connect more than funnels to paths long-term. We need simple abstractions to make it possible
Also note you need a timestamp anyways for figuring out what time range to look at in CH. You really want to avoid querying over all of time
Say we're in a funnel and want to find all the paths users take between two steps. So, we have a list of personIDs, and corresponding event timestamps and eventIDs for start and finish.
There are also some constraints:
(1) If we can help it, we shouldn't use window functions (non-distributed)
(2) If we can help it, keep things simple.
The problem: Given a personID and an eventUUID does uniquely identify a point on the user's path, but it isn't very helpful in figuring out which way to go on the path. That's where the timestamp comes in. Not only does it uniquely identify a start and end, it also is sufficient to figure out the direction to move in (from smaller timestamp to larger timestamp).
Thus, the only role (I think) the event UUID plays is to choose a start/end point, assuming there were multiple events happening at the same timestamp.
Thus, I want to make event UUID optional (it's complex to get this out of funnels, and paths only use it for autocapture elements, nothing else).
This allows us to (1) write easy queries (you literally just constrain the timestamps to get the complete path). (1a) Once you have a path array, filter with the UUID to get the right start and end points, if needed.
So, for all Paths related querying over persons, all we require are personIDs and timestamps. And optionally, for better granularity, an event UUID.
Thoughts?
Parent: #5545
More context: https://posthog.slack.com/archives/C02283301EK/p1629122867088400
The text was updated successfully, but these errors were encountered: