Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Blob Upload Processor #33737

Open
3 tasks
michaelsafyan opened this issue Jun 24, 2024 · 22 comments
Open
3 tasks

New component: Blob Upload Processor #33737

michaelsafyan opened this issue Jun 24, 2024 · 22 comments
Labels
Accepted Component New component has been sponsored

Comments

@michaelsafyan
Copy link
Contributor

michaelsafyan commented Jun 24, 2024

The purpose and use-cases of the new component

The Blob Uploader Processor takes selected attributes/fields (from spans, span events, logs, etc.) and:

  • Writes them to a large blob storage system
  • Replaces them in the original signal with a "Foreign Attribute Reference" referencing the URI of where it was written
  • Forwards the signal to a pipeline for the same signal type for further processing, export

This component is intended to address a number of concerns:

  • Sensitivity of data: certain data may be necessary to retain for debugging but may not be suitable for access by all oncallers or others with access to general operational data; writing certain attributes to a separate blob storage system may allow for finer-grained, alternative access restrictions to be applied compared with the general ops backend.
  • Size of the data:: some operational backends may have limitations around the size of the data they can receive; sending large attributes to a separate blob storage backend may avoid these limitations.
  • Costs of storage: while most operational data may need to be available quickly to address incidents, certain attributes may be needed to be accessed less frequently and may be suitable for lower cost, long-term storage options.

Motivating Examples:

  • HTTP request/response pairs stored in span attributes (http.request.body.content and http.response.body.content)
  • LLM prompt/response pairs stored in span event attributes ( gen_ai.prompt and gen_ai.completion)

Use Cases Related to the Examples:

  • Additional restrictions around the access are needed beyond that of the general operations solution; writing to a separate blob storage allows additional access controls to be applied. Links to the destination enable the results to be located in a separate backend storage system that provides the necessary checks on access.

  • Full request/responses get used rarely by the oncallers, only when their end user opens a ticket through their support mechanism; writing this data to a separate, low-cost storage system allows the user to save on their ops storage costs.

Example configuration for the component (subject to change)

The following is intended to illustrate the general idea, but is subject to change:

The configuration consists of a list of ConfigStanzas:

config := LIST[ConfigStanza]

Each config stanza defines how it will handle exactly one type of attribute. The properties of the stanza are:

  • match_attribute_key: (REQUIRED) The exact attribute key to match (e.g. http.request.body.content)
  • match_attribute_only_in: (OPTIONAL) Allows the key to be matched in only a specific part of the signal.
    • Supported values include:
      • SPAN: only look at span-level attributes (not resource, scope, or event attributes)
      • RESOURCE: only look at resource-level attributes (not span, scope, or event attributes)
      • SCOPE: only look at scope-level attributes (not span, resource, or event attributes)
      • EVENT: only look at event-level attributes (not span, resource, or scope attributes)
  • destination_uri: (Required) The pattern to which to write the data.
    • Ex: gs://example-bucket/full-http/request/payloads/${trace_id}/${span_id}.txt
    • Patterns may reference other parts of the signal, including:
      • trace_id
      • span_id
      • resource.attributes
      • span.attributes
      • scope.attributes
    • Keys can be referenced with dot or bracket notation (e.g. span.attributes.foo or span.attributes[foo]).
  • content_type: (OPTIONAL) Indicates the content type of the attribute (default: AUTO)
    • Options include:
      • AUTO: attempt to infer the content type automatically
      • extract_from: expr: derive it from other information in the signal
        - Ex: extract_from: span.attributes["http.request.header.content-type"]
      • any literal string (e.g. "application/json"): to use a static value
  • fraction_to_write: (OPTIONAL) Allows down sampling of the payloads. Defaults to 1.0 (i.e. 100%)
  • fraction_written_behavior: (OPTIONAL) Defaults to REPLACE_WITH_REFERENCE.
    • Options include:
      • REPLACE_WITH_REFERENCE: replace the value with a reference to the destination location.
      • KEEP: the write is a copy, but the original data is not altered.
      • DROP: the fact that a write happened will not be recorded in the attribute
  • fraction_not_written_behavior: (Optional) Defaults to DROP.
    • Options include:
      • DROP: remove the attribute in its entirety
      • KEEP: don't modify the original data if this fraction wasn't matched

Here is a full example with the above in mind:

 - match_attribute_key: http.request.body.content
   match_only_in: SPAN
   destination_uri:  "gs://${env.GCS_BUCKET}/${trace_id}/${span_id}/request.json"
   content_type: "application/json"

 - match_attribute_key: http.response.body.content
   match_only_in: SPAN
   destination_uri: "gs://${env.GCS_BUCKET}/${trace_id}/${span_id}/response.json"
   content_type: "application/json"

Telemetry data types supported

Traces

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am a member of the OpenTelemetry organization.
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

braydonk, michaelsafyan, dashpole

Sponsor (optional)

dashpole

Additional context

No response

@michaelsafyan michaelsafyan added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Jun 24, 2024
@dashpole
Copy link
Contributor

I am willing to potentially sponsor this, but I would would love to see if any others have needed to store very large or sensitive attributes separately. I plan to raise this tomorrow at the SIG meeting.

@dashpole
Copy link
Contributor

I raised this at the SIG meeting today, but this wasn't an issue people on the call had run into before.

@dashpole
Copy link
Contributor

There is some consideration of moving the "larger" genai attributes. open-telemetry/semantic-conventions#483 (comment)

@karthikscale3
Copy link

We Langtrace are also interested to test out this span processor as we are also thinking about this problem. We currently have 2 GenAI OTEL instrumentation libraries - python and typescript.

@lmolkova
Copy link

The LLM Semconv WG is considering reporting prompts and completions in event payloads (and breaking them down into individual structured pieces) - open-telemetry/semantic-conventions#980

Still, there is a possibility that prompts/completion messages could be big. There is interest in the community to record generated images, audio, etc for debugging/evaluation purposes.

From general semconv perspective, we don't usually define span attributes that may contain unbounded data (gen_ai.prompt and completion are temporary exceptions), are are likely to recommend events/logs payloads for this.

In this context, it could make sense to also support blob uploads with LogProcessor. See also open-telemetry/semantic-conventions#1217 where a similar concerns have been raised for logs.

@michaelsafyan
Copy link
Contributor Author

In the interests of transparency, I have started related work on this here:

https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor

I originally started with a "processor", but I'm having doubts regarding whether this functionality is possible with a processor and am now looking into representing it as an "exporter" that wraps another exporter (but perhaps this is incorrect?). In any event, the (very early, not yet complete code) is in development here:

https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/exporter/blobattributeexporter

I appreciate the insight that this may shift to a different representation... with that in mind, I am going to try to make this more general. While I will start with span attributes to handle current representations, I will keep the naming general and allow this to grow to address write-aside to blob storage from other signal types and other parts of the signal.

@michaelsafyan
Copy link
Contributor Author

Quick Status update:

  • Still working on this
  • Current ETA expectation is ~2 weeks to get a working demo

Will give another update in 2 weeks time or when this is working, whichever is sooner.

@michaelsafyan
Copy link
Contributor Author

Apologies that this is taking longer than expected. I am, however, still working on this.

@michaelsafyan
Copy link
Contributor Author

The general shape of this is now present and can be found in:

https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/connector/blobattributeuploadconnector

I still need to polish this and create end-to-end testing, but there is probably enough here to get early feedback.

Note that while the original scope was intended to focus on spans, the above covers BOTH spans AND span events, given the pivot of the GenAI semantic conventions towards span event attributes.

I also pivoted from hand-rolling the string interpolation, to trying to leverage OTTL to do it:

... this required some hackery in OTTL, though, and am wondering if there is an even cleaner approach than this.

@codefromthecrypt
Copy link
Contributor

@michaelsafyan thanks! To catch you up to date, the current semver 1.27.0 is already span events, so this is relevant.

What's a question mark to many is the change to log events. For example, not all backends know what to do with them, and there is some implied indexing. So, I would expect that once this is in, folks will want to transform log events (with span context) back to span events.

Do you feel up to adding a function like interpolateSpanEvent to do that? Something like logEventWithSpanContextToSpanEvent?

@michaelsafyan
Copy link
Contributor Author

@codefromthecrypt can you elaborate on what you mean by folks will want to transform log events (with span context) back to span events. Is that so that separate logs can get processed by this connector?

The way that I'm thinking about this is that blobattributeuploadconnector will be a generic component that enable:

  1. Uploading attribute content to a blob storage destination.
  2. Replacing the original attribute value with a "Foreign Attribute Reference" (see foreignattr.go)

What I have there now targets:

  • span attributes
  • span event attributes

A logical expansion of this logic would be to also handle:

  • log attributes
  • (maybe?) log body

Other types of conversions (such as span events to logs, or logs back into span events) make sense and would be useful, but probably should be considered out of scope for this particular component (and should probably be tracked in a separate issue), though I agree that it is important for different users to decide whether their events data is recorded as events attached to a span or as separate logs (and that a connector is likely to be a good way to implement that).

@codefromthecrypt
Copy link
Contributor

@michaelsafyan so the main q about log events was in relation to the genai spec which is about to switch to them. Since this spec is noted in the description, that's why I thought it might be in scope for this change/PR.

What do you think is a better place to move the topic of transform "span events to log events" to? If you don't have a specific idea, I'll open a new issue, just didn't want to duplicate this, if it was in scope.

@michaelsafyan
Copy link
Contributor Author

I think new, separate issues for "Log Events -> Span Event Connector" and "Span Events -> Logs Connector" would make sense.

@michaelsafyan michaelsafyan changed the title New component: blob writer span processor New component: Blob Attribute Uploader Connector Aug 14, 2024
@codefromthecrypt
Copy link
Contributor

cool. I opened #34695 first, and if I made any mistakes in the description please correct if you have karma to do so, or ask me to, if you don't.

@michaelsafyan
Copy link
Contributor Author

Just providing another update, since it has been a while.

I was out on vacation last week and had other work to catch up on this past week.

I am hoping to resume this work this coming week.

This is still on my plate.

@michaelsafyan
Copy link
Contributor Author

Quick status update:

  • Believe that the code (for spans and span events) is largely complete, but bugs may turn up as tests are written
  • Iterating on unit tests (traces_test.go).

I am, however, encountering merge conflicts when attempting to sync from upstream ... so this may require some additional work to resolve.

@michaelsafyan
Copy link
Contributor Author

Status update:

Still working on writing tests.

As per usual, getting progressively from one error to a different kind of error.

Now the errors that I'm getting are related to the string interpolation library which relates to open issue: #34700

I'm also realizing that the data model in https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/connector/blobattributeuploadconnector/internal/foreignattr is one that probably requires more input/agreement in OTel SemConv. I will be opening up an issue there shortly to discuss further and to ensure that it won't block up streaming this code when it is done.

@michaelsafyan
Copy link
Contributor Author

Status update: now have the string interpolation logic in OTTL working.

Next steps:

  • Complete end-to-end integration tests of existing logic
  • Add support for logs and event bodies
  • Start splitting out pieces of this and trying to upstream individual pieces

@michaelsafyan
Copy link
Contributor Author

Status update:

  • End-to-end integration tests of existing logic now pass

To keep the change from growing out of control and to prevent horrible merge conflicts down the road, I'm thinking about upstreaming parts of this piecemeal and then expanding capabilities rather than trying to include every single signal type from the outset before starting to upstream.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 12, 2024
@michaelsafyan michaelsafyan changed the title New component: Blob Attribute Uploader Connector New component: Blob Uploader Connector Oct 14, 2024
@michaelsafyan
Copy link
Contributor Author

I'm renaming this from blobattributeuploadconnector to simply blobuploadconnector given that we want to also be able to target event bodies (or sub-paths within them).

A renamed version now exists in this development branch:

I'm going to work on getting pieces of this upstreamed and, in parallel, I am going to start a new development branch for adding capabilities related to logs. That work will proceed here:

@dashpole
Copy link
Contributor

I will sponsor this component. Thanks @michaelsafyan for working on this!

@dashpole dashpole added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Oct 23, 2024
@michaelsafyan michaelsafyan changed the title New component: Blob Uploader Connector New component: Blob Upload Processor Nov 1, 2024
Copy link
Contributor

github-actions bot commented Jan 1, 2025

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
Development

No branches or pull requests

7 participants