-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sessions as a first-class object #6005
Comments
Thanks for driving this forward @rcmarron! I made a PR with my proposal for this, let's continue the conversation there. TLDR; I'm very aligned with your approach to what to do about sessions but think it's a separate discussion, independent to Diagnosing Causes. For the use case we solve today, let's get rid of the sessions concept as it's not solving well for our users. |
related #4884 |
I was going to say - if you want help here... the original creator of rr web said (last year) he'd be up for some contracting to improve the integration between PostHog and rrweb if we could use more help here. @paolodamico could you link your PR? I can't find it! |
That could be really useful as we're trying to tackle some of the rrweb bugs this upcoming sprint. I'll definitely let you know if it makes sense. Here is @paolodamico's PR: PostHog/posthog.com#2028 I'm going to close this issue to move the conversation there. |
Note: I'm a bit out of my depth here and don't have all of the context, but as we're working on the sessions page, this seemed like an important topic to consider and start a discussion around.
Sorry for the length, but Tldr; I think having sessions as a 'first class object' in the product would help our user experience (particularly for customers without session recordings enabled), and I think updating client libraries to include a
session_id
with events is the best approach.Problems with sessions:
In the product today, sessions (not 'session recordings') are calculated objects. This means each time we want to display sessions to a user, the sessions must be calculated based on the list of events. The query that does this looks for gaps of >30m in a user's events, and splits a session when a gap is found. This query is expensive, and it leads to several issues:
posthog.identify()
is called because the query groups ondistinct_id
.How important are sessions?
Having sessions as a first class object only makes sense if we're going to use sessions in the product. So what good are they?
Exploring specific users (seems useful)
For customers who don't have session recordings enabled, sessions are the best way to dive into the actions of a specific user. It allows users to take the exploratory route to understanding what their customers are doing.
For example, if I wanted to understand why users are dropping off a funnel, it would be great to
For this type of exploratory work, sessions are not strictly needed, but it simplifies it. An alternative would be to identify the user that dropped off in the funnel, and let customers explore the list of events around the time of drop off. It's basically the same thing, but it lacks the ability to filter the sessions, and we loose the "clear vocabulary" to describe what's happening in the product.
Asking questions (less useful)
The other place that sessions can be useful is in asking questions about user behavior. But in this case, it seems pretty replaceable by the user/time alternative. For example, the question "how many users 'rage click' twice per session?" is very similar to "how many users 'rage click' twice per hour?". I'm struggling to think of example questions where it's really useful to differentiate by session.
Summary: Sessions seem useful for diving into user behavior. This could play a significant role in 'diagnosing causes', particularly for users without 'session recording' enabled. I think there is a decent argument to be made for why we can 'make do' with just the user/time combination (instead of sessions), but if we take this approach, we may be hamstringing ourselves down the road for features like filtering sessions.
What would it take to 'do sessions right'?
At a high level, the big issue with sessions today is that we calculate them each time. To avoid this, we would need a way of storing the session. From what I can tell, there are two high-level ways of doing it: client based and server based.
Client based
In a client based approach, events would be tagged with a
session_id
by the client. The client is in charge of determining when a new session starts generally based on time (if it's been 30 minutes since the last event, then a new session starts). By its nature, this approach splits sessions based on devices, browsers, etc. This seems like the approach taken by Amplitude and HeapPros:
Cons:
Server based
In a server based approach, the server keeps track of sessions for users. Either it is maintaining a 'sessions table' with start/end times, or it's assigning sessions to events on ingestion. This seems like the approach taken by MixPanel (I'm guessing this is why our events don't have
session_id
already 🙂)Pros:
Cons:
Summary:
Between these two approaches, I feel like the client based one is much simpler and most of its 'cons' might actually be 'pros'....
Other considerations
Backfilling: With either approach, backfilling sessions could be painful. With customers being self-hosted, the idea of 'running a migration' doesn't seem reasonable. @mariusandra had suggested a janitor that runs in the background. It seems like it's worth considering if a backfill is needed or if it's OK to just 'move forward'.
@paolodamico @macobo @mariusandra I'd love to hear your thoughts. I'm sure I'm missing things here. (also, not positive who else should be included).
The text was updated successfully, but these errors were encountered: