Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when using json mode #257

Open
vietk opened this issue Jun 30, 2022 · 6 comments
Open

Performance issue when using json mode #257

vietk opened this issue Jun 30, 2022 · 6 comments

Comments

@vietk
Copy link

vietk commented Jun 30, 2022

Hello,

We are currently integrating the splunk library logging.
During some performance tests we noticed that a lot of exceptions were thrown when trying to send message with default json mode.
The message payload of the events we are sending is just text (no JSON)

Here's a flamegraph putting this fact in evidence

image

My interrogation comes from the code that creates the JSON event: there's a systematic call to GSON library parser even if it's possible to pass a non JSON structure.

https://github.com/splunk/splunk-library-javalogging/blob/main/src/main/java/com/splunk/logging/serialization/EventInfoTypeAdapter.java#L31-L36

Is there any obvious reasons for this behaviour ?
Performance wise, each time, there's a cost to start the parsing and then creating the exception (which is costly in Java) because we don't send JSON at all.

Is it possible to improve the situation ?
It's not a problem for me to do a Pull Request, if you tell me what to do to.

Regards

@bparmar-splunk
Copy link
Contributor

Hi @vietk,
Thanks for bringing this up.

It is true that passing a non-JSON string might not need JSON parsing, but when we check EventInfoTypeAdater file, it is meant for, serialising, JSON parsing.

If we consider your scenario, then we have a type attribute with which you can set raw instead of json to send raw events. With this it will not unnecessarily parse non-JSON string.

Hope this helps !

@vietk
Copy link
Author

vietk commented Jul 20, 2022

Hello @bparmar-splunk, thanks a lot for your answer.

I am sorry but I don't really get why I should switch to RAW mode, the event JSON structure is really a good match for Java log events.
In addition to that I want to benefit from the extraction of metadata of the log event (thread name, severity, logger...) provided by the library.
Moreover the source type being json, facilitates the indexing/extracting on these fields in Splunk UI side.
Finally the this source type avoid, in case of exception, to consume CPU on Splunk indexer side to find the boundaries of the logEvents by using regular expression, the stacktrace being hold in JSON structure.

For sure, I could use the RAW type, but then I'll have to carbon copy what's done in HECEventSerializer + EventInfoTypeAdater and specify the sourcetype as JSON, and I would need to create an specific appender for my logging library to avoid repeating that in my code.

Lastly, the fact that this library provides a whole set of handlers/appenders for a lot of logging libraries makes me think that the way to consume it is with plain text message rather than JSON, as what is demonstrated here

To avoid non backward compatible change, would you consider adding a flag to avoid parsing the message as JSON structure and consider it as a plain string in the EventInfoTypeAdater ?

If you agree we could provide a PR to bring this change in

Thanks!

@rquinio1A
Copy link

Why not check if string starts with "[" or "{" (ignoring whitespaces) before attempting JSON parsing ?

Using JSON-formatted messages with slf4j/log4j is pretty rare to my knowledge, so it's important the most common case has good performance.

log.info("my message") // -> No attempt to parse as JSON, so no exception triggered
log.info("{ \"myevent\": \"mymessage\" ") // Still parsed as a nested JSON object under "message" field.

@bparmar-splunk
Copy link
Contributor

Hi @vietk,
Thank you for your inputs.

As I mentioned earlier, EventInfoTypeAdapter class is created for serialising data to JSON, and it is only used for Gson instance initialisation.
Its main purpose is to convert data to JSON. If I modify the behaviour of that class by one flag then the purpose of that adapter will get violated.
In terms of Single Responsibility principle, ideally that class should not be manipulated with a flag.
There should always be an alternative to get the work done. But modifying its actual behaviour would make that generic.

Please share your thoughts on this.

@rquinio1A
Copy link

Hi @bparmar-splunk,

Possibly there's a confusion between the log message string (containing JSON or not) and the HEC format type (JSON or Raw) ?

  1. log.info("my message") with type JSON will send over the wire { "event": { "message": "my message" } } <- That looks like the default config/usage of the lib, so should be optimized.
  2. log.info("{ \"myevent\": \"mymessage\" }") with type JSON will send over the wire { "event": { "message": { "myevent": "mymessage"} } }
  3. log.info("my message") with type Raw will send over the wire my message

I've written a similar benchmark of com.splunk.logging.HttpEventCollectorLoggingHandler that shows the same contention in case 1., due to synchronized send method + cost of JsonSyntaxException stacktraces: https://github.com/rquinio1A/quarkus-logging-splunk/blob/feature/jmh-benchmark/integration-test2/src/test/java/io/quarkiverse/logging/splunk/HttpEventCollectorLoggingHandlerBenchmark.java

image

@bparmar-splunk
Copy link
Contributor

Hi @rquinio1A,
Thank you for clarification.

@vietk,
We are open for the PR to support your use case. Please let us know in case of any further queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants