-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(profiling): Handle 2 new ItemType for profiling #1170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do a thorough review yet, but the marked code sections will panic if the headers are not in the incoming payload. This should instead produce an error.
relay-server/src/actors/store.rs
Outdated
fn produce_profiling_session_message(&self, item: &Item) -> Result<(), StoreError> { | ||
let message = ProfilingKafkaMessage { | ||
organization_id: item | ||
.get_header("organization_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic if the header is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I believe the SDK should not send any organization ID at all, and it should not be in the item header. It should be inferred from the actual DSN that the SDK uses to authenticate with, due to:
- security reasons, SDKs should not determine org ID freely after authenticating with any DSN
- ergonomics, SDK should not have to be configured with more than DSN to start sending data successfully
Check how other methods do it, they access envelope scoping to get information about org/project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. We should fill in the org ID in relay. I also wonder if we need this installation_id, is that not the project_id we get from the public key anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if installation id is also a "customer identifier" (like org/project, not like DID/user) it needs to go away as well imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
installation_id
was just a way for us to count the number of devices sending traces. It has no use besides that, we can remove it.
organization_id
was meant as a temporary measure so we can match a Sentry ID with a Specto ID (this organization ID in the headers would represent the Specto ID, not Sentry), but since we're not going to release our SDK to existing Specto users, we can remove that and just maintain a mapping of IDs on our side.
No ID, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phacops and I had a chat on that, and it depends on how much orgs and projects are integrated during Phase 1.
- If Specto requires their proprietary IDs, then they can provide them through the header for now, and we'll back it out during Phase 2. We can consider all of this API experimental.
- If Specto can deal can map our IDs on their end, we would add this directly to the Kafka message in Relay, just like we do for Metrics or Sessions. @phacops if you trace the callers of this, you'll see how to pull that off.
I don't see it in the Kafka message then.
+1. Do you intend to add this as headers to the Kafka message, or include it in its payload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the organization_id
, I added Sentry's ID to the Kafka message (https://github.com/getsentry/relay/pull/1170/files#diff-b043940a406beb64d66fa208a834b04f23942ae398a6cbe8633514e9ddff0eabR432) and we'll map it on our end. In phase 1, only the Specto organization will send traces, so that's easy and starting phase 2, we'll only use Sentry's IDs.
We have an app_id
inside the payload
, which I think would app with your project_id
. I don't know if changing this right now is a good idea, phase 1 is really about showing something in the Sentry's UI and it's simpler to not change it right now to get to that goal. I can add the project_id
to the message but likely won't use it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out, we can also forget our app_id
for now and just use the project_id
, even for phase 1. So, I think we're all good now since we don't have custom/private IDs anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an app_id inside the payload, which I think would app with your project_id. I don't know if changing this right now is a good idea, phase 1 is really about showing something in the Sentry's UI and it's simpler to not change it right now to get to that goal.
If I understand you correctly you moved on from this approach, but when this is deployed we have potentially this problem:
- sdk authenticates as org X via sentry DSN, and sends data under app ID/installation ID belonging to org Y
- specto stores data under org Y even though user only had credentials to X
if you think of installation_id
as a credential then everything is fine again (maybe?), but I would argue that needing two credentials is not ideal and the added complexity adds risk. I would hope that Relay can correctly authenticate all Specto traffic before we deploy anything, and there's no unauthenticated data on the kafka topic.
I'm glad you found a way to get rid of the specto IDs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation_id
is not some sort of credential, it was purely to distinguish between 2 installations of Specto. That's why we can remove it now. The app_id
inside the payload won't be used anymore, we'll use the project_id
instead. I think this solves the potential problem you're talking about.
Also, keep in mind this is for an internal release and the Specto SDK will be scraped during phase 2 as we transition to the Sentry SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preliminary review
relay-server/src/actors/store.rs
Outdated
fn produce_profiling_session_message(&self, item: &Item) -> Result<(), StoreError> { | ||
let message = ProfilingKafkaMessage { | ||
organization_id: item | ||
.get_header("organization_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I believe the SDK should not send any organization ID at all, and it should not be in the item header. It should be inferred from the actual DSN that the SDK uses to authenticate with, due to:
- security reasons, SDKs should not determine org ID freely after authenticating with any DSN
- ergonomics, SDK should not have to be configured with more than DSN to start sending data successfully
Check how other methods do it, they access envelope scoping to get information about org/project.
| ItemType::FormData => event_size += item.len(), | ||
| ItemType::FormData | ||
| ItemType::ProfilingSession | ||
| ItemType::ProfilingTrace => event_size += item.len(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will have to be revisited, I am not sure if we can have the same expectations around profiling data's size.
payload: item.payload(), | ||
}; | ||
relay_log::trace!("Sending profiling trace item to kafka"); | ||
self.produce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will produce your message as msgpack object. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's intended. Since our payload
field is a byte array and msgpack
supports binary data, I think we're good with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, g2g then.
Let's wait with merge & deploy until we have the cluster and topic provisioned, please. cc @getsentry/ops |
This PR seeks to add support for 2 new
ItemType
to help with the transition from the Specto to Sentry. These new types will be pushed to different Kafka topics and the rest of the processing will be handled by the Specto ingestion service for now.The 2 new types are:
ProfilingSession
: contains information about device on whichProfilingTrace
are collected.ProfilingTrace
: contains the profiling data. The payload will be our own Protobuf structure sent as binary.