New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat(extrapolation): add extrapolation develop docs #11780

Draft

shellmayr wants to merge 11 commits into master from shellmayr/feat/develop-extrapolation-docs

Member

shellmayr commented Nov 11, 2024

(currently still in draft mode)

Closes getsentry/sentry#79815


          feat(extrapolation): add extrapolation develop docs

25431c1

vercel bot commented Nov 11, 2024 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
develop-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 14, 2024 2:28pm

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
changelog	⬜️ Ignored (Inspect)	Visit Preview		Nov 14, 2024 2:28pm
sentry-docs	⬜️ Ignored (Inspect)	Visit Preview		Nov 14, 2024 2:28pm

shellmayr requested a review from jan-auer

November 11, 2024 09:27

vercel bot deployed to Preview – changelog

November 11, 2024 09:29

View deployment

vercel bot deployed to Preview – develop-docs

November 11, 2024 09:34

View deployment

jan-auer reviewed

View reviewed changes

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx

Comment on lines +55 to +56

		\| max \| no \|
		\| percentiles \| yes \|

Member

jan-auer Nov 11, 2024

min and max are percentiles, yet they say no. The subtle difference is that we do extrapolate values of counts and sums, but all of the other data is merely weighted. The sentence below alludes to this, but I think we might want to classify this into two categories for the future; technically there could be ways to extrapolate count_unique and maybe even maxima/minima.

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

shellmayr added 3 commits

November 11, 2024 14:03

wip

bb2cb19

wip

d136d6f


          add metadata

808617f

vercel bot deployed to Preview – changelog

November 11, 2024 13:11

View deployment

vercel bot deployed to Preview – develop-docs

November 11, 2024 13:21

View deployment


          add opt-out section

a3e1e06

vercel bot deployed to Preview – changelog

November 11, 2024 13:25

View deployment

vercel bot deployed to Preview – develop-docs

November 11, 2024 13:31

View deployment


          editing & formatting

49a7212

vercel bot deployed to Preview – changelog

November 12, 2024 10:11

View deployment

vercel bot deployed to Preview – develop-docs

November 12, 2024 10:16

View deployment


          formatting

b0c2aee

vercel bot deployed to Preview – changelog

November 12, 2024 10:40

View deployment

vercel bot deployed to Preview – develop-docs

November 12, 2024 10:46

View deployment

shellmayr added 2 commits

November 12, 2024 13:01


          restructure

f0f19bf


          clean up extrpaolation procedure explanations

286eee0

vercel bot deployed to Preview – changelog

November 12, 2024 12:06

View deployment

vercel bot deployed to Preview – develop-docs

November 12, 2024 12:18

View deployment


          structure & some wording

e2e5482

shellmayr requested a review from jan-auer

November 13, 2024 14:22

vercel bot deployed to Preview – develop-docs

November 13, 2024 14:30

View deployment

jan-auer reviewed

View reviewed changes

Member

jan-auer left a comment

Thank you! A couple of suggestions and thoughts from reading this below.

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated

+              sidebar_order: 5
+              ---
+              Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates.

Member

jan-auer Nov 13, 2024

Suggested change

      
            Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. 
          
            Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, the accuracy of which will be negatively affected by such a bias. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates.

Member Author

shellmayr Nov 14, 2024

The passive voice "the accuracy of which" is a bit cumbersome, replaced the sentence with this version: "This bias especially impacts numerical attributes like latency, reducing their accuracy."

develop-docs/application/dynamic-sampling/extrapolation.mdx

+              - **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minor random decisions can influence the result in major ways.
+              - **Expressiveness** refers to data being able to express something about the state of the observed system. Expressiveness refers to the usefulness of the data for the user in a specific use case.
+              Data can be any combination of accurate and expressive. To illustrate these properties, let's look at some examples. A single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics that are not very expressive. When traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness.

Member

jan-auer Nov 13, 2024

This section reads very well now, thanks for restructuring. The example given for expressiveness is good, but it discusses the opposite of what is important for extrapolation: The expressiveness of aggregates, as in: when is it valid to deal with the aggregate and can you derive meaningful insights from it. This is not the case when the aggregate includes an insufficient amount of data points (even with high sample rates). Is "expressiveness" maybe the wrong word for this?

Lower in the document, "Sample Mode" discusses this scenario, and I think it's worth lifting this concern up here.

develop-docs/application/dynamic-sampling/extrapolation.mdx Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated


		Depending on the context and the use case, one mode may be more useful than the other.

		Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time.

Member

jan-auer Nov 13, 2024

I was thinking about this a bit more, the listed example might not make sense if the product always shows matching samples next to the aggregates. Still, it can be useful of looking at the number of samples -- though what's a compelling use case to illustrate this? The usual example is "search by trace ID in trace explorer", but this is an extreme edge case.

Suggested change

      
            Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time.
          
            Generally, default mode is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There are scenarios where the user needs to temporarily switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation. Therefore, the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time.

develop-docs/application/dynamic-sampling/extrapolation.mdx Outdated Show resolved Hide resolved


          incorporate review coments

b5de574

vercel bot deployed to Preview – develop-docs

November 14, 2024 14:28

View deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet