Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing Configuration #399

Closed
bobbytables opened this issue May 17, 2018 · 64 comments · Fixed by #5043
Closed

Tracing Configuration #399

bobbytables opened this issue May 17, 2018 · 64 comments · Fixed by #5043
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@bobbytables
Copy link

It would be nice to expose the bootstrap config a bit more with things like tracing and stats configs. We could even offer something like a file path that gets merged into the bootstrap YAML on boot.

For context, we're building our own control plane for service -> service mesh for Envoy. We're including stats and Zipkin traces (via the jaeger endpoint). We lose the trace context between Contour and our sidecar'd envoys because we can't configure Contour to include trace information.

@davecheney
Copy link
Contributor

Hi @bobbytables. To be very transparent, merging pieces of abstract configuration is not a feature I plan to add.

However, the contour bootstrap initContainer is not required to use Contour, it's just a convenience. You could replace the output of contour bootstrap with a ConfigMap into a volume. See #1

@davecheney davecheney added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. documentation kind/question Categorizes an issue as a user question. labels May 18, 2018
@bobbytables
Copy link
Author

Roger that, I'll probably end up doing the mounted configmap then. Thanks!

@davecheney
Copy link
Contributor

davecheney commented May 22, 2018 via email

@mattalberts
Copy link

mattalberts commented Jul 19, 2018

@bobbytables @davecheney
I loved this suggestion (replacing contour bootstrap with a ConfigMap mounted volume); it's fantastically simple. I'm also trying to enable tracing (span injection) at the ingress controller; here's the results of my experiment.

TL;DR

  • configuring envoy opentracing manually (without bootstrap) is easy
  • envoy.http_connection_manager still needs a "tracing" definition to emit headers
  • I have a patch! 😄 (if you're interested)

Contour with OpenTracing

First, deploy contour without bootstrap and mount an updated envoy configuration from a ConfigMap.

---
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: contour
    role: ingress
  name: contour-config
  namespace: ingress-system
data:
  contour.yaml: |
    dynamic_resources:
      lds_config:
        api_config_source:
          api_type: GRPC
          cluster_names: [contour]
          grpc_services:
          - envoy_grpc:
              cluster_name: contour
      cds_config:
        api_config_source:
          api_type: GRPC
          cluster_names: [contour]
          grpc_services:
          - envoy_grpc:
              cluster_name: contour
    static_resources:
      clusters:
      - name: contour
        connect_timeout: { seconds: 5 }
        type: STRICT_DNS
        hosts:
          - socket_address:
              address: 127.0.0.1
              port_value: 8001
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        circuit_breakers:
          thresholds:
            - priority: high
              max_connections: 100000
              max_pending_requests: 100000
              max_requests: 60000000
              max_retries: 50
            - priority: default
              max_connections: 100000
              max_pending_requests: 100000
              max_requests: 60000000
              max_retries: 50
      - name: service_stats
        connect_timeout: 0.250s
        type: LOGICAL_DNS
        lb_policy: ROUND_ROBIN
        hosts:
          - socket_address:
              protocol: TCP
              address: 127.0.0.1
              port_value: 9001
      - name: zipkin
        connect_timeout: { seconds: 5 }
        type: LOGICAL_DNS
        lb_policy: ROUND_ROBIN
        hosts:
          - socket_address:
              protocol: TCP
              address: zipkin-collector.tracing-system
              port_value: 9411
    tracing:
      http:
        name: envoy.zipkin
        config:
          collector_cluster: zipkin
          collector_endpoint: /api/v1/spans
    admin:
      access_log_path: /dev/null
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 9001
---

```bash
kubectl create ns ingress-system
kubectl apply -f issue-399.yaml -l "app=contour"
kubectl -n ingress-system logs $POD_NAME -c envoy

[2018-07-19 17:52:21.931][1][info][main] source/server/server.cc:178] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-07-19 17:52:21.935][1][info][config] source/server/configuration_impl.cc:52] loading 0 listener(s)
[2018-07-19 17:52:21.937][1][info][config] source/server/configuration_impl.cc:92] loading tracing configuration
[2018-07-19 17:52:21.938][1][info][config] source/server/configuration_impl.cc:101]   loading tracing driver: envoy.zipkin
[2018-07-19 17:52:21.938][1][info][config] source/server/configuration_impl.cc:119] loading stats sink configuration
[2018-07-19 17:52:21.939][1][info][main] source/server/server.cc:353] starting main dispatch loop

Echo

Next, deploy a service to help inspect the upstream request.

kubectl apply -f issue-399.yaml -l "app=echo"
curl -iv http://echo.127.0.0.1.xip.io/

Request Headers:
	accept=*/*
	content-length=0
	host=echo.127.0.0.1.xip.io
	user-agent=curl/7.60.0
	x-envoy-expected-rq-timeout-ms=15000
	x-envoy-internal=true
	x-forwarded-for=192.168.65.3
	x-forwarded-proto=http
	x-request-id=b645e1b6-2634-43f5-bd95-b6bac6b61c26
  • missing tracing headers in the request.

If I update the listener creation code (specifically, the function httpFilter) to conditionally add a tracing definition, spans are emitted correctly to my configured opentracing backend.

+       if enabletracing {
+               filter.Config.Fields["tracing"] = st(map[string]*types.Value{
+                       "operation_name": sv("egress"),
+               })
+       }
+       return filter
  • This isn't all of the patch, but its the relevant part

Swapping out the container definition with this

        - name: contour
          image: docker.io/mattalberts/contour:0.5.0-test
          imagePullPolicy: Always
          command: ["contour"]
          args:
            - serve
            - --incluster
            - --enable-tracing
            - --ingress-class-name=contour
curl -iv http://echo.127.0.0.1.xip.io/

Request Headers:
	accept=*/*
	content-length=0
	host=echo.127.0.0.1.xip.io
	user-agent=curl/7.60.0
	x-b3-sampled=1
	x-b3-spanid=31bd1c8e3db02128
	x-b3-traceid=31bd1c8e3db02128
	x-envoy-expected-rq-timeout-ms=15000
	x-envoy-internal=true
	x-forwarded-for=192.168.65.3
	x-forwarded-proto=http
	x-request-id=45b03ab1-2271-92e0-b926-c8946aa9c0d3
  • tracing headers in the request!

My patch currently relies on a flag --enable-tracing. Looking through the envoy code base, the overhead when tracing is enabled feels fairly minimal, but it does cause extra work to be done.

issue-399.yaml.txt

  • remove the .txt

@mattalberts
Copy link

My initial work used a command-line option --enable-tracing to globally add the tracing definitions to all listeners. Looking through the annotations, it might make more sense to expose this as an annotation. Which would be preferred?

  1. per-service annotation enable?
  2. global enable with per-service annotation disable?
  3. global enable?

I'm leaning towards 1 or 2.

@mattalberts
Copy link

@rosskukulinski I probably missed something. Setting up the static cluster/trace resource was easy; I used the zipkin trace config

      - name: zipkin
        connect_timeout: { seconds: 5 }
        type: LOGICAL_DNS
        lb_policy: ROUND_ROBIN
        hosts:
          - socket_address:
              protocol: TCP
              address: zipkin-collector.tracing-system
              port_value: 9411
    tracing:
      http:
        name: envoy.zipkin
        config:
          collector_cluster: zipkin
          collector_endpoint: /api/v1/spans

The more difficult part was enabling as part of the envoy filter. All the code has very much moved around by now :), I should pull 0.8.1 and play with it (I'm living back on 0.5.0).

@davecheney davecheney added the blocked/needs-product Categorizes the issue or PR as blocked because it needs a decision from product management. label Mar 12, 2019
@davecheney
Copy link
Contributor

davecheney commented Jun 18, 2019

Moving to the unplanned milestone. We don’t plan on looking at this til after Contour 1.0

Blocked:

@davecheney davecheney added blocked Blocked waiting on a dependency and removed kind/question Categorizes an issue as a user question. labels Jun 18, 2019
@davecheney davecheney added this to the Unplanned milestone Jun 18, 2019
@bgagnon
Copy link
Contributor

bgagnon commented Jun 28, 2019

The approach we have taken for structured logs (#624) without having to fork Contour is to introduce a middleware gRPC proxy between Envoy and Contour and activate the gRPC Access Log Service (ALS).

This effectively augments the xDS responses "in flight" to enable certain features (such as ALS) for which Contour does not have a design yet (or may never implement for various reasons). We can also remove/modify certain switches that Contour configures (for example, disable the stdout access log).

This is probably going to be our strategy to get tracing (this ticket) working. Unlike Contour, this proxy gRPC server does not have to be very fancy with caching as it can effectively pass-through most requests verbatim. With its limited scope, it can also be implemented in any language that supports gRPC. This extra latency is out of the data path and is a reasonable compromise.

@moderation
Copy link

Another potential option is to re-use or modify the FluentD plugin from Google that is covered at Envoy, Nginx, Apache HTTP Structured Logs with Google Cloud Logging

@inigohu
Copy link

inigohu commented Sep 23, 2019

How's the situation with this issue?
Should I replace the output of contour bootstrap with a ConfigMap into a volume ? Or is there another recommended approach to enable envoy tracing?

@davecheney
Copy link
Contributor

@inigohu the best way to do this at the moment would be to provide your own configuration to envoy. contour bootstrap is a convenience to provide the parameters we expect for Envoy and Contour to work together but it is not required. You can replace it with your own mechanism for providing bootstrap configuration to Envoy.

@mattalberts
Copy link

I found altering the envoy config to be insufficient to get tracing working. I had to patch the setup of the HTTPConnectionManager

func tracing(enableTracing bool) *http.HttpConnectionManager_Tracing {
	if enableTracing {
		return &http.HttpConnectionManager_Tracing{
			OperationName: http.EGRESS,
		}
	}
	return nil
}

@v0id3r
Copy link

v0id3r commented Feb 23, 2020

Faced this problem too.
Adding HttpConnectionManager_Tracing helps and enables traces, but it is not mentioned here how to configure sampling (as we set defaults - 100%).

According to envoy HttpConnectionManager.Tracing configuration (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto.html#extensions-filters-network-http-connection-manager-v3-httpconnectionmanager-tracing), it is possible to have default filter inserted and perform configuration with Runtime variables https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/runtime#config-http-conn-man-runtime)

I to do this and it works.

Patch listener.go to add tracing filter to HttpConnectionManager

Tracing: &http.HttpConnectionManager_Tracing{},

Configure envoy with bootstrap config in configmap

Add zipkin cluster.

clusters:
...
  - name: zipkin
  ...
...

Add tracing configuration.

tracing:
  http:
    name: envoy.zipkin
      typed_config:
        "@type": type.googleapis.com/envoy.config.trace.v2.ZipkinConfig
        collector_cluster: zipkin
        collector_endpoint: "/api/v2/spans"
        collector_endpoint_version: HTTP_JSON

Add layered_runtime configuration to set sampling rates.

layered_runtime:
      layers:
        - name: static_layer
          static_layer:
            tracing:
              client_enabled: 100
              global_enabled: 100
              random_sampling: 0

How it works now:
We do simple request with random_sampling set to 0.
curl helloworld-go.default.127.0.0.1.xip.io
Container received header

X-B3-Sampled: 0

=> not sampled.

Now trying with x-envoy-force-trace header.
curl http://helloworld-go.default.127.0.0.1.xip.io -H 'x-envoy-force-trace: 1'

X-B3-Sampled: 1

=> sampled.

Maybe someone can propose much better solution - it would be great!
Maybe to go with 100% tracing, but do sampling on collector side...

@pims
Copy link
Contributor

pims commented May 5, 2020

Sorry to revive an old issue, but as discussed with @youngnick recently, to enable tracing as described by @mattalberts we had to fork Contour.

To summarize our current implementation:

  1. we skip contour bootstrap and provide Envoy configuration through a volume mounted with the content of a ConfigMap. This requires no changes to contour itself.
  2. we provide a new --enable-tracing serve flag and patch HttpConnectionManager with pretty much verbatim code described in Tracing Configuration #399 (comment)

It'd be great to be able to do this natively in Contour, without having to maintain our own fork.

@tsloughter
Copy link

@pims can you send as a PR?

If (2) were merged into Contour then the only custom part would be the need to skip contour bootstrap and create an Envoy configuration manually, correct?

That would be a good first step but I assume it would be better to also work with use of bootstrap. Would this be done as a "merge" operation, to allow providing an Envoy config that Contour then merges during bootstrap, or need to add a new Contour configuration that changes bootstrap to include trace configuration for Envoy?

@youngnick
Copy link
Member

youngnick commented May 6, 2020

I have a few questions for those of you who've made this change:

What happens if you enable tracing, but don't set up the cluster in the bootstrap?

I'd really like Contour to be able to tell you something about if the config is working, in general. Do any of you who are or would use this feature want that here? What information would you want Contour to provide, in an ideal world? Please see #2495 for some other discussion around surfacing information to HTTPProxy users, and #2325 for more closely related ideas.

If we did add something to contour bootstrap to add tracing configuration, seems like it would need at least three parameters - enable-tracing, tracing-address, tracing-endpoint, and maybe whatever endpoint_version does. Does that seem right?

To be clear, I think that Contour needs to have the ability to tell Envoy to do tracing, and I want the feature in. I just want to make sure that we've made the feature useful, and operable, even for people who don't know much about Envoy.

Edit: No problems with reviving an old issue @pims. Thanks for restarting the discussion here.

@pims
Copy link
Contributor

pims commented May 6, 2020

@youngnick nothing happens if tracing is enabled but not configured in bootstrap, nor if the tracing cluster is unavailable. I can't really test those two cases with regard to memory/cpu consumption on a busy cluster at this time though.

The only errors that we’ve experienced during upgrades is :

Proto constraint validation failed (Using the default now-deprecated value HTTP_JSON_V1 for enum 'envoy.config.trace.v2.ZipkinConfig.collector_endpoint_version' from file trace.proto. This enum value will be removed from Envoy soon so a non-default value must now be explicitly set. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details. If continued use of this field is absolutely necessary, see https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#using-runtime-overrides-for-deprecated-features for how to apply a temporary and highly discouraged override.): collector_cluster: "zipkin"
collector_endpoint: "/api/v2/spans"

when we didn't specify the collector_endpoint_version.

@youngnick
Copy link
Member

Thanks for that @pims! So, do you think that Contour (that is, contour serve) needs to tell you anything other than "Hey, you enabled tracing"? Feels like contour serve would need at least a cluster name (that we could default if necessary), and we could get away with just logging "Traces will be sent to this cluster, if it's up".

As an aside, I'd really like a way to put some of this info on some object's status, somewhere. Not sure where that would be, yet.

What do you think about my suggestion for contour bootstrap as well? Do you think that it captures the info the bootstrap config would need, in a general case?

@youngnick
Copy link
Member

Also, everyone watching this issue, but particularly @v0id3r: what do you think about configuring sampling? Is it worth bringing this feature in with everything set to 100% sampling, and adding configurability later, or do we need to discuss sampling now too?

@skriss skriss added kind/feature Categorizes issue or PR as related to a new feature. and removed area/documentation Issues or PRs related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jun 11, 2021
@skriss skriss removed this from the Backlog milestone Jul 21, 2021
@youngnick
Copy link
Member

Looks like envoyproxy/envoy#20281 may be looking to add OpenTelemetry tracing support in Envoy, so we may be able to go straight to that? I think this work is still blocked on us doing some work about Typed ExtensionService objects though.

@skriss skriss added this to Contour Jul 25, 2022
@sunjayBhatia sunjayBhatia added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 26, 2022
@skriss skriss removed their assignment Oct 11, 2022
@yoitsro
Copy link
Contributor

yoitsro commented Oct 19, 2022

Hey team. Any news on this?

@yoitsro
Copy link
Contributor

yoitsro commented Nov 11, 2022

@sunjayBhatia @youngnick hi both! Please can we have an update on this? 🙏

As it stands, it's a bit of a blocker not having this in our environment due to the lack of observability. It makes it super difficult to easily understand where requests originated from and how long they're taking within our system. Any updates would be hugely appreciated!

@sunjayBhatia
Copy link
Member

At the moment we have some resources planned to tackle this "soon", hopefully as part of the 1.25.0 release cycle

However if you have resources available to contribute that would be super helpful, here is another slack thread on this with context: https://kubernetes.slack.com/archives/C8XRH2R4J/p1667313513678379

Overall we would love someone from the community to help out to implement this feature as there has been interest

@yangyy93
Copy link
Member

yangyy93 commented Dec 7, 2022

I want to make some contributions about contour support exporting tracing data to opentelemetry, is there anything I need to pay attention to?

@skriss
Copy link
Member

skriss commented Jan 19, 2023

Tentatively adding this to 1.25.0 since @yangyy93 has taken up the design work!

@skriss skriss moved this to Todo in Contour Jan 19, 2023
@skriss skriss added this to the 1.25.0 milestone Jan 19, 2023
@skriss skriss moved this from Todo to In Progress in Contour Feb 2, 2023
@sidharthramesh
Copy link

It'd be great if the tracing setup being built could also supports the span of an external authorization request and forward the trace headers to the service as well. A somewhat related issue - emissary-ingress/emissary#683

@github-project-automation github-project-automation bot moved this from In Progress to Done in Contour Apr 13, 2023
skriss pushed a commit that referenced this issue Apr 13, 2023
Adds support for exporting trace data
to OpenTelemetry.

Closes #399.

Signed-off-by: yy <[email protected]>
padlar pushed a commit to padlar/contour that referenced this issue Apr 20, 2023
Adds support for exporting trace data
to OpenTelemetry.

Closes projectcontour#399.

Signed-off-by: yy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.