Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying Paths: Uniquely identifying Path start and ends #5618

Closed
neilkakkar opened this issue Aug 17, 2021 · 2 comments
Closed

Querying Paths: Uniquely identifying Path start and ends #5618

neilkakkar opened this issue Aug 17, 2021 · 2 comments
Labels
feature/paths Feature Tag: Paths

Comments

@neilkakkar
Copy link
Contributor

neilkakkar commented Aug 17, 2021

Say we're in a funnel and want to find all the paths users take between two steps. So, we have a list of personIDs, and corresponding event timestamps and eventIDs for start and finish.

There are also some constraints:
(1) If we can help it, we shouldn't use window functions (non-distributed)
(2) If we can help it, keep things simple.

The problem: Given a personID and an eventUUID does uniquely identify a point on the user's path, but it isn't very helpful in figuring out which way to go on the path. That's where the timestamp comes in. Not only does it uniquely identify a start and end, it also is sufficient to figure out the direction to move in (from smaller timestamp to larger timestamp).

Thus, the only role (I think) the event UUID plays is to choose a start/end point, assuming there were multiple events happening at the same timestamp.

Thus, I want to make event UUID optional (it's complex to get this out of funnels, and paths only use it for autocapture elements, nothing else).

This allows us to (1) write easy queries (you literally just constrain the timestamps to get the complete path). (1a) Once you have a path array, filter with the UUID to get the right start and end points, if needed.

So, for all Paths related querying over persons, all we require are personIDs and timestamps. And optionally, for better granularity, an event UUID.

Thoughts?

Parent: #5545
More context: https://posthog.slack.com/archives/C02283301EK/p1629122867088400

@macobo
Copy link
Contributor

macobo commented Aug 17, 2021

I agree and would go further: events at the same timestamp are an edge case we don't need to worry about at all. Hence I'd advocate for using timestamps only to KISS.

Another reason for that is we need to connect more than funnels to paths long-term. We need simple abstractions to make it possible

Also note you need a timestamp anyways for figuring out what time range to look at in CH. You really want to avoid querying over all of time

@EDsCODE
Copy link
Member

EDsCODE commented Aug 17, 2021

Yeah I was trying to get uuids for complete accuracy but if we can swing just the person id with timestamp, I'm all for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/paths Feature Tag: Paths
Projects
None yet
Development

No branches or pull requests

3 participants