out_s3: log_key configuration option implemented #3668

StephenLeeY · 2021-06-19T02:31:30Z

Signed-off-by: Stephen Lee [email protected]

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

Documentation required for this feature

fluent/fluent-bit-docs#552

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

By default, the whole log record will be sent to S3.

If you specify a key name with this option, then only the value of that key
will be sent to S3.

For example, if you are using the Fluentd Docker log driver, you can specify
log_key log and only the log message will be sent to S3. If the key is not
found, it will skip that record.

This patch has been tested using test configuration files and various
input plugins (random, exec, etc) as well as valgrind. The resulting output,
as expected, only contained values of the specified log_key.

Record without `log_key`

{"date":"2021-06-16T19:56:28.441428Z","log":"Pi is roughly 3.141605814119434"}

Record with `log_key log`

Pi is roughly 3.141605814119434

Example Configuration File

[INPUT]
    name exec
    command date +"%Y-%m-%d %H:%M:%S,%3N"

[OUTPUT]
    name s3
    match *
    region us-west-2
    bucket bucket-name
    s3_key_format /test/$UUID.gz
    use_put_object true
    total_file_size 2M
    upload_timeout 10s
    compression gzip
    store_dir /tmp/fluent-bit/s3-output-buffer
    retry_limit 5
    log_key exec

StephenLeeY · 2021-06-19T02:44:31Z

For some reason, when running with the example configuration file, the Valgrind is throwing many errors even on a clean master branch. For this reason, I believe it is not my code throwing these errors but the current master branch code. Below are the Valgrind logs for the example configuration file.

Example Configuration File Valgrind Output Logs

==13492== HEAP SUMMARY:
==13492==     in use at exit: 701,635 bytes in 5,374 blocks
==13492==   total heap usage: 126,115 allocs, 120,741 frees, 23,018,072 bytes allocated
==13492==
==13492== LEAK SUMMARY:
==13492==    definitely lost: 0 bytes in 0 blocks
==13492==    indirectly lost: 0 bytes in 0 blocks
==13492==      possibly lost: 0 bytes in 0 blocks
==13492==    still reachable: 701,635 bytes in 5,374 blocks
==13492==         suppressed: 0 bytes in 0 blocks
==13492== Rerun with --leak-check=full to see details of leaked memory
==13492==
==13492== For counts of detected and suppressed errors, rerun with: -v
==13492== ERROR SUMMARY: 16 errors from 11 contexts (suppressed: 0 from 0)

If Valgrind and FluentBit are running with a config option that has oneshot set to true, there will be no Valgrind errors. Here is a sample config file with oneshot true.

`oneshot true` Configuration File

[INPUT]
    name exec
    command date +"%Y-%m-%d %H:%M:%S,%3N"
    oneshot true

[OUTPUT]
    name s3
    match *
    region us-west-2
    bucket bucket-name
    s3_key_format /test/$UUID.gz
    use_put_object true
    total_file_size 2M
    upload_timeout 10s
    compression gzip
    store_dir /tmp/fluent-bit/s3-output-buffer
    retry_limit 5
    log_key exec

Below are the Valgrind Output Logs for this specific case.

`oneshot true` Valgrind Output Log

[2021/06/19 02:37:43] [ warn] [engine] service will stop in 5 seconds
[2021/06/19 02:37:48] [ info] [engine] service stopped
==12155==
==12155== HEAP SUMMARY:
==12155==     in use at exit: 701,635 bytes in 5,374 blocks
==12155==   total heap usage: 119,836 allocs, 114,462 frees, 14,688,145 bytes allocated
==12155==
==12155== Searching for pointers to 5,374 not-freed blocks
==12155== Checked 940,744 bytes
==12155==
==12155== LEAK SUMMARY:
==12155==    definitely lost: 0 bytes in 0 blocks
==12155==    indirectly lost: 0 bytes in 0 blocks
==12155==      possibly lost: 0 bytes in 0 blocks
==12155==    still reachable: 701,635 bytes in 5,374 blocks
==12155==         suppressed: 0 bytes in 0 blocks
==12155== Reachable blocks (those to which a pointer was found) are not shown.
==12155== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12155==
==12155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==12155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

StephenLeeY · 2021-06-19T02:48:08Z

@PettitWesley

PettitWesley · 2021-06-20T00:43:17Z

plugins/out_s3/s3.c

+                        flb_plg_error(ctx->ins, "Could not allocate enough "
+                                      "memory to read record");
+                        continue;


always call flb_errno() after alloc failures. Also, it'd be more typical to just fail and return if an allocation fails. Remember that allocation failures shouldn't happen normally.

PettitWesley · 2021-06-20T00:44:10Z

plugins/out_s3/s3.c

+                    if (out_buf == NULL) {
+                        out_buf = flb_sds_create(val_buf);
+                    } else {
+                        out_buf = flb_sds_cat(out_buf, val_buf, strlen(val_buf));
+                    }
+                    out_buf = flb_sds_cat(out_buf, "\n", 1);


all of these functions can allocate memory IIRC, and need checks that they were successful

PettitWesley · 2021-06-20T00:48:06Z

plugins/out_s3/s3.c

+                if (strncmp(ctx->log_key, key_str, key_str_size) == 0) {
+                    found = FLB_TRUE;
+
+                    orig_val_buf = flb_malloc(bytes + bytes / 4);


allocated 1.25 * bytes here doesn't make sense to me. bytes is the entire size of the msgpack, but this is inside an iteration of the loop, where you are only dealing with a single record.

If possible write this code so that you do not allocate any memory inside the loop. that will make it much more efficient and faster. One thing with C code is that allocating a lot of memory should often be considered to be completely fine- the machines fluent bit is run on usually have a lot of memory. For efficiency, you should often care more about keeping those allocs infrequent, because calls to alloc memory are much slower than any other computation.

And for this code, I see no reason why you can't alloc one large buffer outside the loop, and then use it within the loop. You might possibly need multiple buffers, some for copying and temporarily storing data. But the point is to try to see if you can do everything by alloc'ing a few buffers before/after the loop. Nothing inside of it.

StephenLeeY · 2021-06-21T19:21:44Z

Added documentation PR here: fluent/fluent-bit-docs#552 and added fixes.

PettitWesley · 2021-07-07T07:34:51Z

plugins/out_s3/s3.c

+                            alloc_error = 1;
+                            break;


I see alloc_error as a condition in the loop, but you also use a break (which seems better)...

Because this part is in a nested loop, I need to double break, hence alloc_error to break out of the outer while loop. I could use a goto, but I felt a conditional variable was better practice here!

I see. I think that works. I've also seen the use of goto with a well-named label like:

if something bad happens: goto break_outer_loop

PettitWesley · 2021-07-07T07:39:21Z

plugins/out_s3/s3.c

+                    if (out_buf == NULL) {
+                        out_buf = flb_sds_create(val_buf);


This code is very inefficient and I don't understand it.

Why do you need both val_buf and out_buf?

With out_buf, you initialize it to the strlen of val_buf on the first iteration, and then you will realloc it frequently with the calls to flb_sds_cat. So you are not doing efficient memory allocations.

Remember that the goal was to avoid any allocation in the loop.

I do not think you need two buffers. I think you can do this with only 1 buffer, allocated before the loop. You can track an offset for in that single buffer and repeatedly write the logs & newlines to it. No need to copy to another buffer.

PettitWesley · 2021-07-07T07:40:47Z

plugins/out_s3/s3.c

+     "that key will be sent to S3. For example, if you are using "
+     "the Fluentd Docker log driver, you can specify log_key log and only "


Nit: "For example, if you are using Docker, you can specify log_key log and only"

Fluentd docker log driver is too specific, and its not the only thing which adds this log key

StephenLeeY · 2021-07-07T22:16:07Z

@PettitWesley Addressed comments and ready for review.

PettitWesley · 2021-07-09T00:58:20Z

@DrewZhang13 @zhonghui12 I'd like both of you to look at this:

Bare minimum: At least read through the code once and understand it
Points will be awarded for: Useful code review that catches issues or helps Stephen improve

PettitWesley · 2021-07-09T01:27:28Z

plugins/out_s3/s3.c

+    out_buf = flb_sds_create(val_buf);
+    if (out_buf == NULL) {
+        flb_plg_error(ctx->ins, "Error creating buffer to store log_key contents.");
+        flb_errno();
+        return NULL;
+    }
+    flb_free(val_buf);
+
+    return out_buf;
+}


Why tho? Why not just return val_buf?

Later on, we try to do flb_sds_destroy(json), which throws an error when used on a character buffer.

DrewZhang13

Code is pretty clean and clear, left some comments

DrewZhang13 · 2021-07-09T19:24:23Z

plugins/out_s3/s3.c

@@ -624,6 +624,11 @@ static int cb_s3_init(struct flb_output_instance *ins,

    }

+    tmp = flb_output_get_property("log_key", ins);


Do we consider length restriction for the log key config?

plugins/out_s3/s3.c

StephenLeeY · 2021-07-10T00:34:53Z

@PettitWesley Addressed zhonghui and drewzhang's comments!

PettitWesley · 2021-07-17T22:31:47Z

@StephenLeeY rebase with master and squash your commits, then we can merge this. Also put up a PR against the 1.8 branch as well.

StephenLeeY · 2021-07-20T00:28:07Z

Ready for merge @PettitWesley

PettitWesley · 2021-07-23T19:20:27Z

@StephenLeeY needs rebase

By default, the whole log record will be sent to S3. If you specify a key name with this option, then only the value of that key will be sent to S3. For example, if you are using the Fluentd Docker log driver, you can specify log_key log and only the log message will be sent to S3. If the key is not found, it will skip that record. This patch has been tested using test configuration files and various input plugins (random, exec, etc). The resulting output, as expected, only contained values of the specified log_key. Signed-off-by: Stephen Lee <[email protected]>

github-actions bot added the docs-required label Jun 19, 2021

StephenLeeY marked this pull request as ready for review June 19, 2021 02:45

PettitWesley self-requested a review June 20, 2021 00:39

PettitWesley reviewed Jun 20, 2021

View reviewed changes

PettitWesley reviewed Jul 7, 2021

View reviewed changes

PettitWesley reviewed Jul 9, 2021

View reviewed changes

DrewZhang13 reviewed Jul 9, 2021

View reviewed changes

zhonghui12 reviewed Jul 9, 2021

View reviewed changes

plugins/out_s3/s3.c Show resolved Hide resolved

PettitWesley previously approved these changes Jul 17, 2021

View reviewed changes

StephenLeeY dismissed PettitWesley’s stale review via e101159 July 19, 2021 23:47

StephenLeeY force-pushed the s3-log-key-master branch from f86ded8 to e101159 Compare July 19, 2021 23:47

PettitWesley previously approved these changes Jul 23, 2021

View reviewed changes

StephenLeeY dismissed PettitWesley’s stale review via 66f0a76 July 23, 2021 19:22

StephenLeeY force-pushed the s3-log-key-master branch from 934472b to 66f0a76 Compare July 23, 2021 19:22

StephenLeeY force-pushed the s3-log-key-master branch from 66f0a76 to ccff1b3 Compare July 23, 2021 21:12

PettitWesley approved these changes Jul 23, 2021

View reviewed changes

PettitWesley merged commit 082baa7 into fluent:master Jul 23, 2021

adityasoni99 mentioned this pull request Dec 12, 2024

out_azure_blob: add support for log_key #9721

Open

tomekwilk mentioned this pull request Jan 2, 2025

out_azure_blob: add log_key option #9791

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out_s3: log_key configuration option implemented #3668

out_s3: log_key configuration option implemented #3668

StephenLeeY commented Jun 19, 2021 •

edited

Loading

StephenLeeY commented Jun 19, 2021

StephenLeeY commented Jun 19, 2021

PettitWesley Jun 20, 2021

PettitWesley Jun 20, 2021

PettitWesley Jun 20, 2021

StephenLeeY commented Jun 21, 2021

PettitWesley Jul 7, 2021

StephenLeeY Jul 7, 2021

PettitWesley Jul 9, 2021

PettitWesley Jul 7, 2021

PettitWesley Jul 7, 2021

StephenLeeY commented Jul 7, 2021

PettitWesley commented Jul 9, 2021

PettitWesley Jul 9, 2021

StephenLeeY Jul 9, 2021

DrewZhang13 left a comment

DrewZhang13 Jul 9, 2021

StephenLeeY commented Jul 10, 2021

PettitWesley commented Jul 17, 2021

StephenLeeY commented Jul 20, 2021

PettitWesley commented Jul 23, 2021

		"that key will be sent to S3. For example, if you are using "
		"the Fluentd Docker log driver, you can specify log_key log and only "

		@@ -624,6 +624,11 @@ static int cb_s3_init(struct flb_output_instance *ins,

		}

		tmp = flb_output_get_property("log_key", ins);

out_s3: log_key configuration option implemented #3668

out_s3: log_key configuration option implemented #3668

Conversation

StephenLeeY commented Jun 19, 2021 • edited Loading

Record without log_key

Record with log_key log

Example Configuration File

StephenLeeY commented Jun 19, 2021

Example Configuration File Valgrind Output Logs

oneshot true Configuration File

oneshot true Valgrind Output Log

StephenLeeY commented Jun 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephenLeeY commented Jun 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephenLeeY commented Jul 7, 2021

PettitWesley commented Jul 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrewZhang13 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephenLeeY commented Jul 10, 2021

PettitWesley commented Jul 17, 2021

StephenLeeY commented Jul 20, 2021

PettitWesley commented Jul 23, 2021

StephenLeeY commented Jun 19, 2021 •

edited

Loading

Record without `log_key`

Record with `log_key log`

`oneshot true` Configuration File

`oneshot true` Valgrind Output Log