-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_s3: log_key configuration option implemented #3668
Conversation
For some reason, when running with the example configuration file, the Valgrind is throwing many errors even on a clean master branch. For this reason, I believe it is not my code throwing these errors but the current master branch code. Below are the Valgrind logs for the example configuration file. Example Configuration File Valgrind Output Logs
If Valgrind and FluentBit are running with a config option that has
|
plugins/out_s3/s3.c
Outdated
flb_plg_error(ctx->ins, "Could not allocate enough " | ||
"memory to read record"); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always call flb_errno()
after alloc failures. Also, it'd be more typical to just fail and return if an allocation fails. Remember that allocation failures shouldn't happen normally.
plugins/out_s3/s3.c
Outdated
if (out_buf == NULL) { | ||
out_buf = flb_sds_create(val_buf); | ||
} else { | ||
out_buf = flb_sds_cat(out_buf, val_buf, strlen(val_buf)); | ||
} | ||
out_buf = flb_sds_cat(out_buf, "\n", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these functions can allocate memory IIRC, and need checks that they were successful
plugins/out_s3/s3.c
Outdated
if (strncmp(ctx->log_key, key_str, key_str_size) == 0) { | ||
found = FLB_TRUE; | ||
|
||
orig_val_buf = flb_malloc(bytes + bytes / 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocated 1.25 * bytes here doesn't make sense to me. bytes is the entire size of the msgpack, but this is inside an iteration of the loop, where you are only dealing with a single record.
If possible write this code so that you do not allocate any memory inside the loop. that will make it much more efficient and faster. One thing with C code is that allocating a lot of memory should often be considered to be completely fine- the machines fluent bit is run on usually have a lot of memory. For efficiency, you should often care more about keeping those allocs infrequent, because calls to alloc memory are much slower than any other computation.
And for this code, I see no reason why you can't alloc one large buffer outside the loop, and then use it within the loop. You might possibly need multiple buffers, some for copying and temporarily storing data. But the point is to try to see if you can do everything by alloc'ing a few buffers before/after the loop. Nothing inside of it.
Added documentation PR here: fluent/fluent-bit-docs#552 and added fixes. |
plugins/out_s3/s3.c
Outdated
alloc_error = 1; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see alloc_error as a condition in the loop, but you also use a break
(which seems better)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this part is in a nested loop, I need to double break, hence alloc_error
to break out of the outer while loop. I could use a goto, but I felt a conditional variable was better practice here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think that works. I've also seen the use of goto
with a well-named label like:
if something bad happens:
goto break_outer_loop
plugins/out_s3/s3.c
Outdated
if (out_buf == NULL) { | ||
out_buf = flb_sds_create(val_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very inefficient and I don't understand it.
Why do you need both val_buf and out_buf?
With out_buf, you initialize it to the strlen of val_buf on the first iteration, and then you will realloc it frequently with the calls to flb_sds_cat
. So you are not doing efficient memory allocations.
Remember that the goal was to avoid any allocation in the loop.
I do not think you need two buffers. I think you can do this with only 1 buffer, allocated before the loop. You can track an offset for in that single buffer and repeatedly write the logs & newlines to it. No need to copy to another buffer.
plugins/out_s3/s3.c
Outdated
"that key will be sent to S3. For example, if you are using " | ||
"the Fluentd Docker log driver, you can specify log_key log and only " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "For example, if you are using Docker, you can specify log_key log
and only"
Fluentd docker log driver is too specific, and its not the only thing which adds this log key
@PettitWesley Addressed comments and ready for review. |
@DrewZhang13 @zhonghui12 I'd like both of you to look at this:
|
plugins/out_s3/s3.c
Outdated
out_buf = flb_sds_create(val_buf); | ||
if (out_buf == NULL) { | ||
flb_plg_error(ctx->ins, "Error creating buffer to store log_key contents."); | ||
flb_errno(); | ||
return NULL; | ||
} | ||
flb_free(val_buf); | ||
|
||
return out_buf; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why tho? Why not just return val_buf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later on, we try to do flb_sds_destroy(json)
, which throws an error when used on a character buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is pretty clean and clear, left some comments
plugins/out_s3/s3.c
Outdated
@@ -624,6 +624,11 @@ static int cb_s3_init(struct flb_output_instance *ins, | |||
|
|||
} | |||
|
|||
tmp = flb_output_get_property("log_key", ins); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider length restriction for the log key config?
@PettitWesley Addressed zhonghui and drewzhang's comments! |
@StephenLeeY rebase with master and squash your commits, then we can merge this. Also put up a PR against the 1.8 branch as well. |
f86ded8
to
e101159
Compare
Ready for merge @PettitWesley |
@StephenLeeY needs rebase |
934472b
to
66f0a76
Compare
By default, the whole log record will be sent to S3. If you specify a key name with this option, then only the value of that key will be sent to S3. For example, if you are using the Fluentd Docker log driver, you can specify log_key log and only the log message will be sent to S3. If the key is not found, it will skip that record. This patch has been tested using test configuration files and various input plugins (random, exec, etc). The resulting output, as expected, only contained values of the specified log_key. Signed-off-by: Stephen Lee <[email protected]>
66f0a76
to
ccff1b3
Compare
Signed-off-by: Stephen Lee [email protected]
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Documentation
fluent/fluent-bit-docs#552
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
By default, the whole log record will be sent to S3.
If you specify a key name with this option, then only the value of that key
will be sent to S3.
For example, if you are using the Fluentd Docker log driver, you can specify
log_key log and only the log message will be sent to S3. If the key is not
found, it will skip that record.
This patch has been tested using test configuration files and various
input plugins (random, exec, etc) as well as valgrind. The resulting output,
as expected, only contained values of the specified log_key.
Record without
log_key
Record with
log_key log
Example Configuration File