-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add activities to Blazor Server for distributed tracing #29846
Comments
/cc @davidfowl @noahfalk |
@rynowak thanks for reporting for duty. I imagine this is not only true for Blazor but for Websockets/SignalR in general. Is there something we can do here to "disconnect" the distributed tracing? Is it really correct if we somehow disconnect the Blazor Server session from the original request that started it? |
Thanks for contacting us. |
I'm currently working around this by setting So it is can be controlled ....
That's a great question 😁 I think the major downside of the current behavior is that a really really long-lived circuit could include thousands of descendant spans. Think about @danroth27 giving the counter demo for 10 hours as part of a charity livestream. In the worst case this will break downstream tooling, or in the best case just give you data that's hard to work with. I'm really not sure. I might also be the first person to notice this, so I figured it was worth a discussion 🤷 |
@rynowak you bring up good points. I'm not super familiar with the activity id APIs, but In general I believe the fix here is to "detach it" from the original request activity and then to start a new one per received message on the circuit as you suggest, and then tag it with the circuit info |
cc @tarekgh Thanks for raising this @rynowak. I'll try to take a little time in the next day or two to repro this and understand where the Activities are currently being created so I have more context. I may also find alternative suggestions on how the issue can be resolved. I imagine as .NET developers use distributed tracing more frequently it will behoove us to have good guidance and understanding of common patterns that show up.
Links are a new concept added in the 5.0 release of DiagnosticSource.dll that may also be useful here: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans |
Can we get a debugger dump which we can see where the Activity objects get created. That may be faster I guess if possible. I am guessing there is some activities started and never stopped. |
It's an issue we need to address in signalr and then we can have a broader discussion about what behavior we want in general. |
I didn't have any background on how the Blazor app front-end and back-end communicate, but given @davidfowl's comment + a little debugging it appears to be SignalR protocol over WebSocket, is that accurate? In repro app there was a short series of paired Activity start/stops but then 8th Activity start is when Kestrel receives the HTTP request that initiates the WebSocket and that one is long lived by design. Additional activity start/stop pairs get recorded re-entrantly as the server sends messages to itself when the button is clicked. In terms of what to do about it...
|
Yup |
Related to #18711 |
I'm gonna take a stab at this. I think it's more than the activity though that's related here. I don't think we should be propagating the execution context from the request to the hub execution. It's especially broken for long polling scenarios where there are multiple requests throughout the execution of the hub. |
Ryan wrote:
I do remember someone else mentioning this in the context of SignalR a while back, I'm not having any luck finding the issue again unfortunately. |
I'm not sure how useful it's going to be to have a separate activity ID for every message flowing across the websocket connection. The traces will then just say things like "Within circuit So far it hasn't been a design goal to capture more information about each user activity within a Blazor Server circuit. If this was an area of growing customer demand, we could start pulling more information out of the rendertree and navigation manager so (for example) we could say things like "Within circuit Another argument would be that tracing technologies could offer more value by moving beyond thinking in terms of short HTTP round trips and have first-class support for long-running connections (WebSockets in general, not just Blazor Server). I know that would only address the UX concern about how info is presented in the tracing UI, so it's still applicable to think about how Blazor Server can provide more human-readable context information as per the previous paragraph. |
I think its fine to decide that this isn't a priority for you right now if you aren't seeing enough signal that customers care. My guess is that customers will care about this in proportion to how often apps using SignalR are sluggish and how easy it is to correlate actions on one machine to specific code that runs on another tier. For example if there is always a nice 1:1 correspondence between pushing a button in a Blazor front-end and OnButtonClicked() method running on a single back-end + all the expensive work happens inside OnButtonClicked() then the value of instrumenting the SignalR protocol is fairly low. On the other hand if those SignalR communications are bouncing across relay servers and lots of different code on different machines is being invoked in response then there are numerous places where performance issues could creep in. It would be very hard to track it down without timestamps and causal links on the messages being sent and received at each tier. If we don't want to do the work of sending identifiers on each message I do think there are lower level amounts of work that would still be quite helpful:
For a server<->server SignalR communication many of the supporting pieces are already in place and I could walk you through what remains. For client<->server we presumably also need one of the tracing uploaders like OpenTelemetry or ApplicationInsights to support running in the blazor client environment. I'm happy to get the right people in touch but I suspect its more a question or priorities at this point.
It helps if there is instrumentation on the other end of the communication to tie it to. For server<->server usage there would probably be some of that instrumentation in place. For Blazor client<->server it would be there only if you added it (which isn't hard) and a telemetry uploader supported transmitting it (this part is likely a bit more work). Hope that helps? I'm glad to chat online if that is easier, I know there is a lot here : ) |
We're having the same problem. Setup:
Example:
If we are interested in debugging POST /Appointments in Azure Application Insights end-to-end transaction, then we see ALL distributed tracing for ALL Blazor requests to the public APIs, including all internal API calls, SQL commands, Redis requests etc, for each and every Blazor http request since the user visited the web app. This becomes extremely verbose and makes debugging even for short user sessions with Application Insights end-to-end transaction impractical. I would expect to be able to explicitly start activities in Blazor to logically group related Blazor requests and to opt-out of the Blazor circuit's root activity. A workaround we are experimenting with right now is setting |
We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process. |
Thanks for contacting us. We're moving this issue to the |
+1 on guidance for working around this. Where to put |
SignalR is planning to add activities for SignalR invocations: #51557. We should understand the impact of this work on Blazor Server scenarios and what additional activities we need to add for Blazor Server specifically. |
That's awesome progress. I'm still a little disappointed to see the issue title change 😢. Some of my best work IMO. |
I subscribed to this issue just to see the title pop up occasionally in my notifications and make me smile lol 🤣 |
Describe the bug
The long-lived circuits of Blazor server make distributed tracing not work as expected.
Since each circuit is effectively a long-lived request ... a lot of activity (pun intended) ends up getting traced under the same activity. Basically all outgoing requests/traces from inside of a Blazor server circuit (browser tab) will have the same parent span, which causes them to all get grouped together as a single "root cause" in tracing UIs.
To Reproduce
Create a Blazor server application. Use a button to fire off HTTP requests to another ASP.NET Core endpoint. Hook all of this up to your favorite distributed tracing system and enjoy.
All of the operations you start from a Blazor circuit will end inside the same single logical operation from a tracing POV.
The text was updated successfully, but these errors were encountered: