-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinction between structured Body
and Attributes
#1613
Comments
Does it actually help to have another container for key/values associated with Log Record? If something is not covered by semantic conventions, I think it simply means that its definition is out of scope of OT - it might be related to a given environment, come in MDC, etc. As long as there's no conflict in the key names, why we cannot reuse Attributes? |
Attributes require a key to record a value. Body does not. Body is better suited for the most common legacy use case of logs: an unstructured text log line. To record it in the Attributes we would need a semantic convention for what key to use which is different from everything else. Separate Body and Attributes appear to better fit existing logging data models (e.g. MSG vs STRUCTURED-DATA in Syslog, or log message vs log fields in Zap logger). |
Thank you both for commenting. What I was really wondering is if it would make the most sense to have a top-level Maybe I misunderstood, but I got the impression @pmm-sumo was suggesting that any structured data that could currently be placed in |
I think I messed up and my comment was largely referring to I think we may also want to change perspective when looking at that. Let's consider that both One case I can think of if someone wants to put a boundary between metadata (present in To bring an example, consider someone is having a temperature sensor and logging its output. The sensor has some metadata assigned, e.g. id, connection type, etc. that are not part of the record. Practically, this might look like following:
|
@pmm-sumo no worries!
Yeah that's what I was getting at. But the way it's designed currently, any time you want to have structured record content and a top-level message, the structured content would have to be bumped into |
My understanding of |
Right, it's mutually exclusive in terms of So what I'm really wondering is if things would be more straightforward by having a top-level |
I believe they can hold any sort of data. They SHOULD (not MUST) follow Semantic Conventions according to the data model. Lets consider several options:
Things get bit more complex when there's mix of structured and unstructured data. For a practical example, here's a random output from OpenTelemetry Collector:
Taking the timestamp, log level and caller aside, we end up with essentially a message: Actually, log data model comes with an answer for that case - the attributes go to
|
What do you think about the following? If the log record contains one or more "pieces" of data that may fit either in Body or Attributes but do not fit the description of the other top-level fields (Timestamp, Severity, etc) then follow these guidelines to decide how to record these pieces of data in the Body and Attribute fields:
Note: if there is more than one piece of data that matches the rules 1-4 then we cannot record then in the Body, we have to come up with some keys and record each piece of data as a key/value pair in the Attributes. This is a non-exhaustive set of heuristics but should be probably a good starting point. What do you think? |
@pmm-sumo @djaglowski will one of you be able able to submit a PR to make corresponding changes in the spec? |
@tigrannajaryan @djaglowski sure, preparing a proposal. Since the guidelines are quite clear, I am going to literally put those into a dedicated section |
We had great discussion but reached no consensus on neither #1727 (providing guidelines) nor #1752 (be more restrictive). It is currently one of the major ones blocking Logs GA. Perhaps we could sync during the next Log SIG and discuss it online? The next Log SIG is scheduled Oct 27th, 10am Pacific, though we suggest to move it a week earlier, Oct 20th, same time. If that time does not work, we could organise a dedicated sync for that issue. Would that work for you? @yurishkuro @SergeyKanzhelev @errordeveloper @tigrannajaryan @djaglowski @jmm |
@yurishkuro your last comment on the PR was that you fail to see what the confusion is. Here is one more example of the confusion https://cloud-native.slack.com/archives/CJFCJHG4Q/p1627987975028000 |
the question in chat was this (with my emphasis):
I think the problem we're having is we are not clear about who is this "one". Which persona is that? An application developer using a logging API from an application? Or an infrastructure engineer writing a transformation of one data format into OTLP? These are distinct use cases requiring distinct solutions. Having said that, I like the suggestion in https://github.com/open-telemetry/opentelemetry-specification/pull/1727/files#r643105246 of restricting Body to be a primitive type. I think it resolves the confusions without limiting the expressiveness of OTLP model. |
Yes, I think the confusion primarily arises when one needs to generate a log record (via a logging API or emit an OTLP log record). This is a clean slate situation: I know I want to generate a log record, I know roughly what data I want to put in the record, but I can structure the same data such that some primary bit goes into the Body (e.g. the human readable message) and the rest goes into the Attributes or I can put everything into Attributes (with the human readable message being just another Attribute). In other words one can also say that it is a data modelling problem: given some information I know I want to record and having ability to record it in a few different ways how do I make a decision about the shape I want this information to take?
This likely is not an issue, since it is primarily driven by the semantics of that particular data format. We have examples of transformations in the log data model doc, a few of which I wrote and I think it was fairly straightforward to know what to put in the Body vs Attributes.
I don't think that will work. There are formats where log body is a complex structured data (there are a couple examples in log data model doc). |
Well, to me this is not at all the same case. Writing via Logging API should avoid this problem altogether because the API should allow the user to express their intent. Nobody just "emits OTLP log record", OTLP is not something end users are ever exposed to directly.
@tigrannajaryan please elaborate / point to which examples you mean. Logically I don't see how a body could be simultaneously structured and yet unnamed ("body" == no name). Take Zap logger API, for example. The Body there (aka "message") is always a string, you cannot pass a structured data via API as the body. But you can pass it as an attribute as long as you name it. If there is use case in the examples where a body is both structured and unnamed, then I think the structure in that case is not arbitrary (Any), but pre-defined (e.g. IOT device emitting some data), and therefore should be translated into attributes. |
Fair enough, I agree. It only will be a problem if we allow Logging API to accept a structured Body, which we don't have to if we don't think it is necessary. So, our API can restrict the Body to be a string only (actually a string as it is known in some languages may not be good enough, we may need to allow any sequence of bytes, not just valid Unicode strings).
Splunk HEC has a structured Body and additional fields (Attributes). Similarly, I believe Google Cloud Logging can have a structured json_payload mapped to Body. |
So I find it interesting that both Splunk and Google examples have the exact same situation - a body and attributes. So what are their guidelines for which way structured data should go? |
To put it differently, if they haven't solved the semantic distinction between structured body and attributes (and they sit much closer to user intent), how can we expect to solve it downstream of them? I would instead be inclined to change the mapping tables you linked to and say that those structured fields do NOT map to Body, but to similarly named attributes, like |
Unfortunately there are no guidelines that I am aware of. The Google Cloud logging says that it is a union of one of the 3 things:
In a sense it takes a stance that Otel data model takes currently: it tells what can be represented, but doesn't give any recommendations on how to use it. |
I have a feeling this is not the right approach. I think this makes things worse, we are wrapping something that doesn't need to be wrapped. We already have the exact matching concept for it, so wrapping it adds unnecessary data nesting (bad for UIs and for querying, etc). |
Ok, but it seems the only reason we have body as structured field is to support these mappings from 3rd party formats. With respect to this ticket, we can easily say "we recommend treating body is a string", and only treat body as structured when doing those 3rd party conversions. |
@pmm-sumo what do you think? |
I am subjectively inclined towards limiting the Body type to String and Byte Array: What we gain:
What we lose:
I am not sure if either of the listed above is a hard requirement. The state we have now is somewhat confusing and we can avoid that by limiting allowed types. If we want to do it, I think it's easier to do it now, while the OTLP Logs adoption is still relatively low. Then, if we find there are valid use-cases, it can be brought back.
Yeah, that sounds like a viable approach as well. I find it similar to #1727 and further reducing when complex Body types should be used |
I don't mind such limitation in the Logging API. However, if we are talking about the protocol I am strongly against doing this for 2 reasons. Firstly, we have written code that uses Body in structured form. It would be a major breaking change for the Collector, including third-party distros, with unclear consequences. Yes, formally we are allowed to do such breaking changes because the protocols is in Beta, but we need to be considerate of pains we are causing. Secondly, I do not think it is a good idea to limit the protocol in such a way anyway (see below why).
It is a stated goal of the data model to support lossless and unambiguous conversions. We will be breaking our design promise: This design promise enables an important property of the Collector - passing data through the Collector in a particular protocol is lossless and does not change the data in any way. We will lose this if we drop that design constraint. |
Yeah, that is a strong argument. I think it's most reasonable to keep the model as it is and just clarify intended use |
We discussed this today in Log SIG. To summarize:
These guidelines and explanations can be either in the spec or in the form of OTEPs. I will open separate issues to address these 4 topics individually. |
Resolves #2066 and #1752 Supports #2068 ## Changes Adds a note to the log data model which explains the intended usage of the `Body` field. ## Additional Context Extensive discussion has been had on this issue on [#1613](#1613 (comment)), as well as in the Log SIG group.
Shall we close this? |
Resolves #2066 and open-telemetry#1752 Supports #2068 ## Changes Adds a note to the log data model which explains the intended usage of the `Body` field. ## Additional Context Extensive discussion has been had on this issue on [open-telemetry#1613](open-telemetry/opentelemetry-specification#1613 (comment)), as well as in the Log SIG group.
Is there a clear distinction between what belongs in a structured
Body
vs. inAttributes
? If there's not, it seems to make it less predictable where data is expected, and how to map between other models.A top-level "message" string seems to be pretty common among other log data models. In this model, if you want to have a top-level message string and structured data describing the event, are you supposed to put the message in
Body
and other data that would otherwise be inBody
inAttributes
? I don't see anything about special-casing a property ofBody
as a top-level message.From the mapping perspective, considering the Elastic Common Schema (ECS) example, it doesn't include all ECS fields, but
message
is the only one shown as mapping toBody
.Body
seems like a logical (perhaps the most logical) place for fields likeerror.message
orevent.id
. So the fact that fields likeerror.message
are shown as mapping intoAttributes
makes it seem that there's not a clear distinction, and that the mapping could be ambiguous depending on whethermessage
is populated.(BTW, unlike most of the rest of the document, the ECS example refers to "body" and "attributes" rather than "Body" and "Attributes", which I assume is a typo or holdover from a previous version.)
Attributes
is documented as:If I understand correctly, that means that a property of
Attributes
that has the name of a well-known attribute should have the meaning and data type defined for that attribute, but meanwhileAttributes
can also include arbitrary custom attributes.That also seems to have implications for placement of data within
Body
vs.Attributes
, because whereas no semantic conventions apply toBody
, if a property gets bumped fromBody
intoAttributes
, then it may be that conventions are supposed to apply that wouldn't otherwise.The text was updated successfully, but these errors were encountered: