reqwest seems to hang around after instrumented function is done #3090

SorenHolstHansen · 2024-09-26T10:22:44Z

Bug Report

Version

reqwest_tracing_bug v0.1.0
│   │   └── tracing v0.1.40
│   │       ├── tracing-attributes v0.1.27 (proc-macro)
│   │       └── tracing-core v0.1.32
│   │   │   └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
│   │       │   └── tracing v0.1.40 (*)
│   │       └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
├── tracing v0.1.40 (*)
├── tracing-opentelemetry v0.26.0
│   ├── tracing v0.1.40 (*)
│   ├── tracing-core v0.1.32 (*)
│   ├── tracing-log v0.2.0
│   │   └── tracing-core v0.1.32 (*)
│   └── tracing-subscriber v0.3.18
│       ├── tracing v0.1.40 (*)
│       ├── tracing-core v0.1.32 (*)
│       └── tracing-log v0.2.0 (*)
└── tracing-subscriber v0.3.18 (*)

Platform

Darwin Srens-MacBook-Air.local 23.6.0 Darwin Kernel Version 23.6.0: Wed Jul 31 20:53:05 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T8112 arm64

Description

We have noticed a weird situation in our production traces for our axum server where sometimes an endpoint, even if the endpoint makes a response very quickly, that the traces say that the endpoint took a very long time. This doesn't happen all the time, but when it happens, it is always related to reqwest calls (i.e. the reqwest crate).

A trace could look something like this (this is how it looks in our prod metrics platform in azure, you will see below that another platform shows the endpoint as shorter)

endpoint:     -------------
something:    --
reqwest_call:   -----------
stuff_after:       --
response:           --

that is, the response is done quite quickly, but because the reqwest call seems to hang, the whole endpoint shows as taking a long time. Now, the reqwest calls actually all finish very quickly. You can see that the "stuff_after" continues shortly after, and we have done other approaches to show that they actually finish fast. So something is hanging in the reqwest client that makes this show as taking a long time.

We even added a field to the root request span (using tower tracing) that shows that the endpoint is done quickly. So from the users point of view, nothing is taking a long time, but for us, we can't trust our tracing.

I made a reproducible case here: https://github.com/SorenHolstHansen/reqwest_tracing_bug

To reproduce, I started a local open telemetry platform (signoz or jaeger), started the server in the repo i linked, and then ran seq 1 5000 | xargs -P0 -I{} curl http://127.0.0.1:8008

In signoz, you should then be able to see cases like these:

You can see in the bottom right, that the endpoint took 0 seconds, but resolve took 35 seconds.

I am not sure if this is strictly the fault of tracing, or something weird happens internally in reqwest or somewhere else.

The text was updated successfully, but these errors were encountered:

mladedav · 2024-09-26T11:28:34Z

I believe stuff like this happens when something holds onto a span longer than expected. I know this happened to me with h2 and I think it could be a cause here since reqwest pools connections.

Basically the connection has a span that is a child of your endpoint span and thus keeps it alive.

You can try doing something like let _guard = info_span!(parent :none "dummy").entered(); just before doing the reqwest call (and maybe dropping the guard right after). That should fix the endpoint span but you'll no longer see the reqwest span in your trace.

Otherwise just filtering reqwest out should also fix your trace length.

I don't think this can really be fixed in tracing, tracing itself can just give advise on whether libraries should or should not hold onto spans like this and better convey the nuances and what will the user see.

SorenHolstHansen · 2024-09-26T11:46:03Z

Aha, that makes a lot of sense! Thanks for the quick answer, I will open an issue over there as well

SorenHolstHansen changed the title ~~reqwest seems to hand around after instrumented function is done~~ reqwest seems to hang around after instrumented function is done Sep 26, 2024

SorenHolstHansen mentioned this issue Sep 26, 2024

resolve tracing span sometimes lingers even after response received seanmonstar/reqwest#2435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reqwest seems to hang around after instrumented function is done #3090

reqwest seems to hang around after instrumented function is done #3090

SorenHolstHansen commented Sep 26, 2024 •

edited

Loading

mladedav commented Sep 26, 2024

SorenHolstHansen commented Sep 26, 2024

reqwest seems to hang around after instrumented function is done #3090

reqwest seems to hang around after instrumented function is done #3090

Comments

SorenHolstHansen commented Sep 26, 2024 • edited Loading

Bug Report

Version

Platform

Description

mladedav commented Sep 26, 2024

SorenHolstHansen commented Sep 26, 2024

SorenHolstHansen commented Sep 26, 2024 •

edited

Loading