Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading OpenTelemetry dependency changes how receiver connects to network address #4465

Open
yvrhdn opened this issue Dec 18, 2024 · 3 comments

Comments

@yvrhdn
Copy link
Member

yvrhdn commented Dec 18, 2024

Tempo currently has a whole bunch of OTel dependencies at v0.102.1, when I upgrade these to a recent version (v0.116.0) the receiver is not able to receive data anymore in certain configurations.

Background

The cause is a change in behaviour: the OpenTelemetry Collector used to listen on 0.0.0.0 by default. Meaning, if you enabled the receivers with default config, the otlp receiver would for instance listen on 0.0.0.0:4317.

This was brought up as a security risk: open-telemetry/opentelemetry-collector#8510.
Documentation: open-telemetry/opentelemetry-collector//docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

In v0.94.0 a feature gate was added to change this behaviour.
This comment links a lot of PRs related to this work: open-telemetry/opentelemetry-collector#8510 (comment)

Summarized:

  • when using an unspecified address the receiver logs a warning
  • with the introduction of the feature gate UseLocalHostAsDefaultHost, an unspecified address will be replaced by localhost

In v0.110.0 this flag was made stable.
In v0.112.0 the flag was removed entirely.

How does this impact Tempo?

Most Tempo installations use the receivers with the default config, something like:

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
        http:

This used to work fine since the receivers defaulted to 0.0.0.0:4317 and 0.0.0.0:4318 respectively. With the changes to replace unspecified address, the receivers now default to localhost:4317 and localhost:4318.
And as a result connections to Tempo running in a Docker container don't work anymore.

To workaround this you need to specify the address you want to bind to explicitly. For instance if Tempo is running in a container with hostname tempo this should work:

# ...
        http:
          endpoint: "tempo:4318"

You can also explicitly bind to 0.0.0.0 still, but this has potential security risks:

# ...
        http:
          endpoint: "0.0.0.0:4318"

If we wish to upgrade the OTel depedency, we need to document this. I don't believe there is a feasible workaround we can do to preserve the original behaviour (nor would it be desirable considering the aforementioned security risks).

To Reproduce

  1. Upgrade the OTel dependencies to a version >= 1.102.0 (see for instance Update Jaeger and OTel packages #4406)
  2. Rebuild the Docker image
  3. Run Tempo with default settings
  4. Try to connect to the receiver. If the otlphttp receiver is enabled this normally should work but it doesn't:
$ curl -v -H "Content-type: application/json" -X POST http://localhost:4318/v1/traces -d "{}"

curl: (52) Empty reply from server

If you do the same but run Tempo directly (not containerized) it works:

$ curl -v -H "Content-type: application/json" -X POST http://localhost:4318/v1/traces -d "{}"

{"partialSuccess":{}}%
@joe-elliott
Copy link
Member

That's a tough one b/c it will likely break every Tempo install including the default helm configuration. Can we detect a configuration with no specified endpoints like:

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
        http:

and default to 0.0.0.0? should we and log a warning?

The option to specify something more specific exists and this way we wouldn't have this massive breaking change.

@yvrhdn
Copy link
Member Author

yvrhdn commented Dec 19, 2024

I believe we could that here:

// Make sure that the headers are added to context. Required for Authentication.
switch componentID.Type().String() {
case "otlp":
otlpRecvCfg := cfg.(*otlpreceiver.Config)
if otlpRecvCfg.HTTP != nil {
otlpRecvCfg.HTTP.IncludeMetadata = true
cfg = otlpRecvCfg
}
case "zipkin":
zipkinRecvCfg := cfg.(*zipkinreceiver.Config)
zipkinRecvCfg.ServerConfig.IncludeMetadata = true
cfg = zipkinRecvCfg
case "jaeger":
jaegerRecvCfg := cfg.(*jaegerreceiver.Config)
if jaegerRecvCfg.ThriftHTTP != nil {
jaegerRecvCfg.ThriftHTTP.IncludeMetadata = true
}
cfg = jaegerRecvCfg
}

It feels pretty hacky though. I'm not 100% sure what the logic should look like. If endpoint is empty, replace by "0.0.0.0:<default port>"?

@yvrhdn
Copy link
Member Author

yvrhdn commented Dec 19, 2024

We discussed this briefly in the dev chat: changing the configuration is a bit tedious, so if possible we want to avoid this.

In terms of impact: this will potentially impact a large amount of installations, so we need to add a big warning in the release notes.

The tempo helm charts seem to not be impacted since it sets the endpoint: "0.0.0.0:<port>" by default.
https://github.com/grafana/helm-charts/blob/main/charts/tempo/values.yaml#L108-L125

I'm not sure if tempo-distributed does this as well. We would need to check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants