feat(profiling): Handle 2 new ItemType for profiling #1170

phacops · 2022-01-21T22:30:54Z

This PR seeks to add support for 2 new ItemType to help with the transition from the Specto to Sentry. These new types will be pushed to different Kafka topics and the rest of the processing will be handled by the Specto ingestion service for now.

The 2 new types are:

ProfilingSession: contains information about device on which ProfilingTrace are collected.
ProfilingTrace: contains the profiling data. The payload will be our own Protobuf structure sent as binary.

mitsuhiko

I didn't do a thorough review yet, but the marked code sections will panic if the headers are not in the incoming payload. This should instead produce an error.

mitsuhiko · 2022-01-24T08:56:56Z

relay-server/src/actors/store.rs

+    fn produce_profiling_session_message(&self, item: &Item) -> Result<(), StoreError> {
+        let message = ProfilingKafkaMessage {
+            organization_id: item
+                .get_header("organization_id")


This will panic if the header is missing

Generally I believe the SDK should not send any organization ID at all, and it should not be in the item header. It should be inferred from the actual DSN that the SDK uses to authenticate with, due to:

security reasons, SDKs should not determine org ID freely after authenticating with any DSN

ergonomics, SDK should not have to be configured with more than DSN to start sending data successfully

Check how other methods do it, they access envelope scoping to get information about org/project.

I agree with this. We should fill in the org ID in relay. I also wonder if we need this installation_id, is that not the project_id we get from the public key anyways?

if installation id is also a "customer identifier" (like org/project, not like DID/user) it needs to go away as well imo

installation_id was just a way for us to count the number of devices sending traces. It has no use besides that, we can remove it.

organization_id was meant as a temporary measure so we can match a Sentry ID with a Specto ID (this organization ID in the headers would represent the Specto ID, not Sentry), but since we're not going to release our SDK to existing Specto users, we can remove that and just maintain a mapping of IDs on our side.

No ID, no problem.

@phacops and I had a chat on that, and it depends on how much orgs and projects are integrated during Phase 1.

If Specto requires their proprietary IDs, then they can provide them through the header for now, and we'll back it out during Phase 2. We can consider all of this API experimental.

If Specto can deal can map our IDs on their end, we would add this directly to the Kafka message in Relay, just like we do for Metrics or Sessions. @phacops if you trace the callers of this, you'll see how to pull that off.

I don't see it in the Kafka message then.

+1. Do you intend to add this as headers to the Kafka message, or include it in its payload?

For the organization_id, I added Sentry's ID to the Kafka message (https://github.com/getsentry/relay/pull/1170/files#diff-b043940a406beb64d66fa208a834b04f23942ae398a6cbe8633514e9ddff0eabR432) and we'll map it on our end. In phase 1, only the Specto organization will send traces, so that's easy and starting phase 2, we'll only use Sentry's IDs.

We have an app_id inside the payload, which I think would app with your project_id. I don't know if changing this right now is a good idea, phase 1 is really about showing something in the Sentry's UI and it's simpler to not change it right now to get to that goal. I can add the project_id to the message but likely won't use it for now.

It turns out, we can also forget our app_id for now and just use the project_id, even for phase 1. So, I think we're all good now since we don't have custom/private IDs anymore.

We have an app_id inside the payload, which I think would app with your project_id. I don't know if changing this right now is a good idea, phase 1 is really about showing something in the Sentry's UI and it's simpler to not change it right now to get to that goal.

If I understand you correctly you moved on from this approach, but when this is deployed we have potentially this problem:

sdk authenticates as org X via sentry DSN, and sends data under app ID/installation ID belonging to org Y

specto stores data under org Y even though user only had credentials to X

if you think of installation_id as a credential then everything is fine again (maybe?), but I would argue that needing two credentials is not ideal and the added complexity adds risk. I would hope that Relay can correctly authenticate all Specto traffic before we deploy anything, and there's no unauthenticated data on the kafka topic.

I'm glad you found a way to get rid of the specto IDs!

The installation_id is not some sort of credential, it was purely to distinguish between 2 installations of Specto. That's why we can remove it now. The app_id inside the payload won't be used anymore, we'll use the project_id instead. I think this solves the potential problem you're talking about.

Also, keep in mind this is for an internal release and the Specto SDK will be scraped during phase 2 as we transition to the Sentry SDK.

relay-server/src/actors/store.rs

untitaker

preliminary review

untitaker · 2022-01-24T09:26:14Z

relay-server/src/actors/store.rs

+    fn produce_profiling_session_message(&self, item: &Item) -> Result<(), StoreError> {
+        let message = ProfilingKafkaMessage {
+            organization_id: item
+                .get_header("organization_id")


Generally I believe the SDK should not send any organization ID at all, and it should not be in the item header. It should be inferred from the actual DSN that the SDK uses to authenticate with, due to:

security reasons, SDKs should not determine org ID freely after authenticating with any DSN

ergonomics, SDK should not have to be configured with more than DSN to start sending data successfully

Check how other methods do it, they access envelope scoping to get information about org/project.

relay-server/src/envelope.rs

untitaker · 2022-01-25T18:01:25Z

relay-server/src/utils/sizes.rs

-            | ItemType::FormData => event_size += item.len(),
+            | ItemType::FormData
+            | ItemType::ProfilingSession
+            | ItemType::ProfilingTrace => event_size += item.len(),


I think this will have to be revisited, I am not sure if we can have the same expectations around profiling data's size.

relay-server/src/actors/store.rs

jan-auer · 2022-02-01T16:08:28Z

relay-server/src/actors/store.rs

+            payload: item.payload(),
+        };
+        relay_log::trace!("Sending profiling trace item to kafka");
+        self.produce(


This will produce your message as msgpack object. Is that intended?

Yes, it's intended. Since our payload field is a byte array and msgpack supports binary data, I think we're good with that.

Alright, g2g then.

jan-auer · 2022-02-01T17:32:17Z

Let's wait with merge & deploy until we have the cluster and topic provisioned, please. cc @getsentry/ops

Handle 2 new ItemType for profiling

223cd92

phacops requested review from mitsuhiko, jan-auer, fpacifici and a team January 21, 2022 22:30

Add an entry to the changelog

503e3bd

mitsuhiko requested changes Jan 24, 2022

View reviewed changes

untitaker reviewed Jan 24, 2022

View reviewed changes

phacops added 4 commits January 24, 2022 12:26

Use Sentry's organization_id and remove installation_id

7180f50

Lint code

717930f

Fix switch as these new item types don't create events

e4b6422

Add the project ID to the profiling kafka message

de1a980

phacops marked this pull request as ready for review January 25, 2022 16:11

untitaker approved these changes Jan 25, 2022

View reviewed changes

phacops requested a review from mitsuhiko January 28, 2022 15:45

Drop profiling items if feature flag is not enabled

1cd6975

phacops changed the title ~~[profiling] Handle 2 new ItemType for profiling~~ feat(profiling): Handle 2 new ItemType for profiling Jan 31, 2022

jan-auer approved these changes Feb 1, 2022

View reviewed changes

mitsuhiko approved these changes Feb 1, 2022

View reviewed changes

jan-auer merged commit b0ccb29 into master Feb 7, 2022

jan-auer deleted the pierre/profiling-add-kafka-message-types branch February 7, 2022 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(profiling): Handle 2 new ItemType for profiling #1170

feat(profiling): Handle 2 new ItemType for profiling #1170

phacops commented Jan 21, 2022

mitsuhiko left a comment

mitsuhiko Jan 24, 2022

untitaker Jan 24, 2022

mitsuhiko Jan 24, 2022

untitaker Jan 24, 2022 •

edited

Loading

phacops Jan 24, 2022

jan-auer Jan 24, 2022

phacops Jan 24, 2022

phacops Jan 24, 2022

untitaker Jan 25, 2022 •

edited

Loading

phacops Jan 25, 2022

untitaker left a comment

untitaker Jan 24, 2022

untitaker Jan 25, 2022

jan-auer Feb 1, 2022

phacops Feb 1, 2022

jan-auer Feb 1, 2022

jan-auer commented Feb 1, 2022

feat(profiling): Handle 2 new ItemType for profiling #1170

feat(profiling): Handle 2 new ItemType for profiling #1170

Conversation

phacops commented Jan 21, 2022

mitsuhiko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer commented Feb 1, 2022

untitaker Jan 24, 2022 •

edited

Loading

untitaker Jan 25, 2022 •

edited

Loading