-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] Support to Customize bufio.SplitFunc #14593
Comments
Pinging code owners: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I think something like this makes sense, though I would want to look at a PR before being sure. The part I am struggling with is that the fileconsumer embeds |
I'd like to do it. |
Thanks again for all the work on this @atingchen! |
My pleasure😀 |
Thanks a lot. But how can we use this in an existing receiver like filelogreceiver with minimal code change? Is further development needed? |
I don't see this feature as "user facing", since it requires code. It's really meant for component developers, to allow custom splitting logic in other components. Can you describe the case you are trying to solve? |
We are trying to find a way to split logs, which is able to handle both multi-line logs and different formats.
If split correctly, the result should be three logs. |
@h0cheung, you may be able to take advantage of non-linear data flow in the I think you could read each line individually, use the receivers:
filelog:
include: ...
operators:
- type: router
default: recombine
routes:
- expr: 'body matches "2023-01-18 19:19:45.134 [ERROR] [some service] [some package] some message"' # need to extract a proper regex here
output: regex_parser
- type: recombine
...
output: noop # skip to last operator
- type: regex_parser
regex: ...
- type: noop # See: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/noop.md#why-is-this-necessary |
Sorry. I didn't make it clear. The result should be like this: 1st entry:
2nd entry:
3rd entry:
Anyway, thanks a lot. |
Got it. You may be able to use the same strategy with some tweaks: receivers:
filelog:
include: ...
operators:
- type: router
default: recombine
routes:
- expr: 'body matches "xxx.xxx.xxx.xxx - - "' # need to extract a proper regex here
output: regex_parser
- type: recombine
is_first_entry: 'body matches "YYYY-MM-DD..."' # need to extract a proper regex here
output: noop # skip to last operator
- type: regex_parser
regex: ...
- type: noop # See: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/noop.md#why-is-this-necessary |
Is your feature request related to a problem? Please describe.
Now
fileexport
support to marshal telemetry data in proto format and compress data.When usingproto
format or any kind of encoding, encoded telemetry data is no longer written to file line by line. The size of the encoded telemetry data will be written before writing the data to the fileWhen we need read the telemetry data back in, we read the size, then read the bytes into a separate buffer, then parse from that buffer.
Currently
fileconsume.Manager
only supports splitting log entries by newlines or regex patterns and reading logs from a file.Can
pkg/stanza
provide a way to customizebufio.SplitFunc
?Describe the solution you'd like
helper.SplitterConfig
add the function to customizebufio.SplitFunc
fileconsumer.Config
add a function to callCreateCustomizedSplitter
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: