Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datadog] http output POST size (gzip) #1187

Closed
nvrmnd opened this issue Mar 12, 2019 · 18 comments
Closed

[datadog] http output POST size (gzip) #1187

nvrmnd opened this issue Mar 12, 2019 · 18 comments
Assignees

Comments

@nvrmnd
Copy link

nvrmnd commented Mar 12, 2019

Bug Report

Describe the bug
I'm trying to push logs to a Datadog service using http output plugin from a tail input. Everything works well while I add few records at a time to a log file. But when I start with file having already a few thousands of records, looks like http output tries to push them all in one POST making service respond with 413 http status error(too big request/payload).

To Reproduce

  • Example log message if applicable:
{
  "severity": "error",
  "insertId": "de355702-43c3-11e9-b40d-0242acac0164",
  "logName": "API",
  "logType": "Exception",
  "responseBody": {
    "error": {
      "message": "Exception happened"
    }
  },
  "responseStatusCode": 400,
  "additionalData": [],
  "optionalDeveloperMessage": "test"
}
  • Steps to reproduce the problem:
  1. Prepare a file with a few thousands records
  2. Remove offset tracking DB file to make tail plugin read the log from the beginning
  3. Start fluent-bit and get an error:
...
[2019/03/12 17:46:33] [debug] [task] created task=0x5591f0e88980 id=0 OK
[2019/03/12 17:46:43] [error] [out_http] http-intake.logs.datadoghq.com:443, HTTP status=413
[2019/03/12 17:46:43] [debug] [retry] new retry created for task_id=0 attemps=1
...

Expected behavior
Maybe there is a way to limit POST size for every http request somehow?

Your Environment

  • Version used: 1.1.0
  • Configuration:
[SERVICE]
    storage.path              /var/lib/fluent-bit/
    storage.sync              full
    storage.checksum          on
    storage.backlog.mem_limit 5M

    Flush        5
    Daemon       Off
    Log_Level    debug

    Parsers_File parsers.conf
    Plugins_File plugins.conf

[INPUT]
    Name tail
    Buffer_Chunk_Size 64K
    Buffer_Max_Size 128K
    Path  /var/www/vhost/api/storage/logs/json.log
    DB    /var/log/fluentbit-position.sqlite3

[FILTER]
    Name parser
    Match *
    Key_Name log
    Parser json

[OUTPUT]
    Name http
    Retry_Limit 10

    Match  *

    Host http-intake.logs.datadoghq.com
    Port 443
    URI /v1/input/tokenf0f0f0f0f0f0f0f0f0f0f0
    tls on
    Format json
  • Operating System and version: Ubuntu 18.04
  • Filters and plugins: output: http, stdout; input: tail; filter: stock json parser
@nvrmnd nvrmnd changed the title http output buffer http output POST size Mar 12, 2019
@edsiper
Copy link
Member

edsiper commented Mar 27, 2019

@nvrmnd

thanks for reporting the case.

Internally Fluent Bit uses msgpack format (like JSON binary) to serialize the records. The work of out_http plugin per your configuration is to convert that msgpack to a JSON payload (likely bigger than msgpack)

Note that Fluent Bit generate msgpack chunks of no more than 2MB, but looks like I found a case where it could be greater. I am troubleshooting now.

@edsiper edsiper self-assigned this Mar 27, 2019
@edsiper
Copy link
Member

edsiper commented Mar 27, 2019

hmm anyways looks like Datadog have size limits for HTTP API:

Logs
Send your logs to your Datadog platform over HTTP. Limits per HTTP request are:

Maximum content size per payload: 2MB
Maximum size for a single log: 256kB
Maximum array size if sending multiple logs in an array: 50 entries

ref: https://docs.datadoghq.com/api/?lang=bash#get-list-of-active-metrics

anyways we will fix the chunk size problem and I think adding gzip compression on HTTP output will help too

@fsperling
Copy link

fsperling commented Apr 18, 2019

We are also seeing 413 errors when sending from fluent-bit via http plugin (format: json) to an apache.

@fsperling
Copy link

To test we disabled SSL and can see that the POST requests are 2.2MB big.

Content-Length: 2310383
Content-Type: application/json
User-Agent: Fluent-Bit

@anshumgargdd
Copy link

I work with Datadog. It does indeed seem that the problem might be the size of the packet. As our documentation says we limit payload size to 2MB. I will try to take a further look into what can we do here.

@marcosrmendezthd
Copy link

i'd be great if the output size was configurable

@anshumgargdd
Copy link

@nvrmnd
Hi from Datadog!
We were able to confirm that the issue was indeed with msg payloads being too large for the Datadog ingestion endpoint. We are taking a look at the 2MB limit on the endpoint and evaluating if we can increase it at some point. Until the output plugin limits output size to 2MB we'd be happy to offer workarounds if you're interested. Please reach out to us over support ([email protected]) or the public slack channel (https://chat.datadoghq.com/).

@edsiper
Copy link
Member

edsiper commented Jul 31, 2019

FYI: folks, I've just merged gzip support for out_http plugin that will land on v1.3 release (~Aug).

This new option called compress gzip guarantee only data compression, so likely your data will be reduced under 2MB.

Datadog team, are you planning to increase this limit anyways ? (cc: @irabinovitch)

@edsiper edsiper changed the title http output POST size [datadog] http output POST size (gzip) Jul 31, 2019
@NBParis
Copy link

NBParis commented Aug 30, 2019

Hello @edsiper ,

Datadog teams are indeed working on increasing those limits.
Right now we increased the maximum number of logs per batch to 200 (while looking to increase it even more) and are looking into increasing the batch size as well (target is around 5MB).

Regarding compression, the HTTP api do not currently support compressed logs but that's something that is plan as well and should be there in 2019.

@NBParis
Copy link

NBParis commented Sep 13, 2019

Hello @edsiper and everyone,

I wanted to let you know that the Datadog HTTP api to submit logs have been updated to receive batches:

  • up to 500 log messages per batch
  • Maximum size of the whole batch of 5MB
  • Maximum size per log of 256KB

Do you believe that would fit into the fluentbit batch size?

@edsiper
Copy link
Member

edsiper commented Sep 13, 2019

yes, it should.

Since Datadog has contributed a new out_datadog plugin, I am closing this ticket. People should move to use formal plugin upon v1.3 release.

@semoac
Copy link

semoac commented Oct 1, 2019

Does bbba49f solve this problem? Is there any documentation for this new option?

Cheers.

@irabinovitch
Copy link

@semoac The new Datadog server side limits mentioned by @NBParis should resolve this issue in most cases. Additionally the Datadog plugin coming out in the v1.3 release will also help.

Are you still seeing issues?

@semoac
Copy link

semoac commented Oct 1, 2019

Yeah. Sending more than 5MB of logs is kinda stupid but I'm still seeing 413 errors.

I tried tag 1.3.0 on my pod with compress on on the out_http plugin but It didn't solve the issue. I that the correct way to enable gzip? I'm using the same configuration mentioned on this issue.

There any way to identify the request responsible for this (5MB+ payload)? Like a fallback output plugin for when the request fail?

I will check the new output!

@NBParis
Copy link

NBParis commented Oct 2, 2019

Hey @semoac ,

so currently the compression is not supported on the Datadog HTTP api. So just using the default out_http without any compression should work better.
That said, I'm not sure if there is any way to identify payloads that are bigger than 5MB.

@edsiper do you know if there is any plan to define a payload max size in Fluentbit to have a better control on this?
Once compression will be supported on Datadog side, it will obviously help (as reaching 5MB of compressed data in one batch should be fairly hard) but being able to control this directly in the output would be perfect.

@semoac
Copy link

semoac commented Oct 2, 2019

@irabinovitch @NBParis output_datadog solved all my problems and now I have all the missing log on Datadog. I don't completely understand why because this plugin also uses http-intake.logs.datadoghq.com endpoint.

One last thing, I'm using input_forward instead of input_tail and the client is fluent-logger from python

Cheers.

@NBParis
Copy link

NBParis commented Oct 2, 2019

@semoac thanks for the confirmation.

Your changes might have been linked with the update on our end of the maximum batch size which could explain why switching to the output_datadog all the missing logs were then going through.

Do you remember when you did the migration?

@semoac
Copy link

semoac commented Oct 3, 2019

@NBParis the migration was done on Sept 30, but I notice this on Oct 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants