Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sessions as a first-class object #6005

Closed
rcmarron opened this issue Sep 17, 2021 · 4 comments
Closed

Sessions as a first-class object #6005

rcmarron opened this issue Sep 17, 2021 · 4 comments
Labels

Comments

@rcmarron
Copy link
Contributor

Note: I'm a bit out of my depth here and don't have all of the context, but as we're working on the sessions page, this seemed like an important topic to consider and start a discussion around.

Sorry for the length, but Tldr; I think having sessions as a 'first class object' in the product would help our user experience (particularly for customers without session recordings enabled), and I think updating client libraries to include a session_id with events is the best approach.

Problems with sessions:

In the product today, sessions (not 'session recordings') are calculated objects. This means each time we want to display sessions to a user, the sessions must be calculated based on the list of events. The query that does this looks for gaps of >30m in a user's events, and splits a session when a gap is found. This query is expensive, and it leads to several issues:

  • Performance on our sessions page is bad
  • There are bugs like:
    • Sessions are split when posthog.identify() is called because the query groups on distinct_id.
    • Our session query is based on a time range (usually a day), and if a session spans the day, it's split
  • The mapping between 'session recordings' and 'sessions' is messy. Not all sessions have a recording, and some sessions have multiple recordings. This is one reason users think session recording is not working. What is sessions page? #4884
  • We don't expose the concept of a session broadly in our product. It feels natural to allow users to 'see sessions where users did X' in funnels, paths etc. I think the reason we don't do this is because the session query is expensive. View recording(s) relevant to selected person that dropped off at/completed funnel step #5664

How important are sessions?

Having sessions as a first class object only makes sense if we're going to use sessions in the product. So what good are they?

Exploring specific users (seems useful)

For customers who don't have session recordings enabled, sessions are the best way to dive into the actions of a specific user. It allows users to take the exploratory route to understanding what their customers are doing.

For example, if I wanted to understand why users are dropping off a funnel, it would be great to

  • See a list of sessions where users dropped off
  • Filter that list of sessions by duration (>1 minute) or users who performed an additional event
  • Click into a specific session and see the list of events that occurred
  • Look through the list of events to find possible causes for drop-off
  • After identifying a potential cause, see how many other users dropped off for the same reason.

For this type of exploratory work, sessions are not strictly needed, but it simplifies it. An alternative would be to identify the user that dropped off in the funnel, and let customers explore the list of events around the time of drop off. It's basically the same thing, but it lacks the ability to filter the sessions, and we loose the "clear vocabulary" to describe what's happening in the product.

Asking questions (less useful)

The other place that sessions can be useful is in asking questions about user behavior. But in this case, it seems pretty replaceable by the user/time alternative. For example, the question "how many users 'rage click' twice per session?" is very similar to "how many users 'rage click' twice per hour?". I'm struggling to think of example questions where it's really useful to differentiate by session.

Summary: Sessions seem useful for diving into user behavior. This could play a significant role in 'diagnosing causes', particularly for users without 'session recording' enabled. I think there is a decent argument to be made for why we can 'make do' with just the user/time combination (instead of sessions), but if we take this approach, we may be hamstringing ourselves down the road for features like filtering sessions.

What would it take to 'do sessions right'?

At a high level, the big issue with sessions today is that we calculate them each time. To avoid this, we would need a way of storing the session. From what I can tell, there are two high-level ways of doing it: client based and server based.

Client based

In a client based approach, events would be tagged with a session_id by the client. The client is in charge of determining when a new session starts generally based on time (if it's been 30 minutes since the last event, then a new session starts). By its nature, this approach splits sessions based on devices, browsers, etc. This seems like the approach taken by Amplitude and Heap

Pros:

  • This is how we create a session recording today, so we could have a 1:1 between session recordings and sessions.
  • By its nature, the approach defines a session as a relatively simple concept.

Cons:

  • Potentially have overlapping sessions (users on 2 devices at once are 2 sessions - might actually be a good thing for UX)
  • 'Out of app' events (e.g. server events) would not be tied to a session (also might be a good thing)
    • Other products solve this by showing all events within the time range of the session, but clearly indicate which events are 'part of the session' (similar solution could work for the overlapping sessions issue too)

Server based

In a server based approach, the server keeps track of sessions for users. Either it is maintaining a 'sessions table' with start/end times, or it's assigning sessions to events on ingestion. This seems like the approach taken by MixPanel (I'm guessing this is why our events don't have session_id already 🙂)

Pros:

  • If a user is using your mobile website and then switches to your desktop version, it's tracked in a single session.

Cons:

  • Increases the complexity (MixPanel has a bunch of settings to manage this):
    • Are certain event sources included in sessions but others not (e.g. server based events)?
    • Does a 'push notification sent' event continue a session if the user was otherwise inactive?
    • What do we do about sessions on batch uploaded events?

Summary:

Between these two approaches, I feel like the client based one is much simpler and most of its 'cons' might actually be 'pros'....

Other considerations

Backfilling: With either approach, backfilling sessions could be painful. With customers being self-hosted, the idea of 'running a migration' doesn't seem reasonable. @mariusandra had suggested a janitor that runs in the background. It seems like it's worth considering if a backfill is needed or if it's OK to just 'move forward'.

@paolodamico @macobo @mariusandra I'd love to hear your thoughts. I'm sure I'm missing things here. (also, not positive who else should be included).

@paolodamico
Copy link
Contributor

Thanks for driving this forward @rcmarron! I made a PR with my proposal for this, let's continue the conversation there.

TLDR; I'm very aligned with your approach to what to do about sessions but think it's a separate discussion, independent to Diagnosing Causes. For the use case we solve today, let's get rid of the sessions concept as it's not solving well for our users.

@clarkus
Copy link
Contributor

clarkus commented Sep 17, 2021

related #4884

@jamesefhawkins
Copy link
Collaborator

jamesefhawkins commented Sep 17, 2021

I was going to say - if you want help here... the original creator of rr web said (last year) he'd be up for some contracting to improve the integration between PostHog and rrweb if we could use more help here.

@paolodamico could you link your PR? I can't find it!

@rcmarron
Copy link
Contributor Author

I was going to say - if you want help here... the original creator of rr web said (last year) he'd be up for some contracting to improve the integration between PostHog and rrweb if we could use more help here.

That could be really useful as we're trying to tackle some of the rrweb bugs this upcoming sprint. I'll definitely let you know if it makes sense.

Here is @paolodamico's PR: PostHog/posthog.com#2028

I'm going to close this issue to move the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants