Add out_kafka daemonset #11

solsson · 2017-11-11T15:06:04Z

Based on fluent/fluent-bit#94, currently the dev build from fluent/fluent-bit#94 (comment)

A typical record (pretty-printed using jq):

  {
    "kubernetes": {
      "docker_id": "dd50658b5d1110070dfa24511b7c2e6095bc210b7807617bc9c7e9705c64be28",
      "container_name": "fluent-bit",
      "host": "minikube",
      "annotations": {
        "kubernetes.io/created-by": "{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"DaemonSet\\\",\\\"namespace\\\":\\\"logging\\\",\\\"name\\\":\\\"fluent-bit\\\",\\\"uid\\\":\\\"7b5e7b6d-c6ef-11e7-ada5-080027039766\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1695\\\"}}\\n"
      },
      "labels": {
        "version": "v1",
        "pod-template-generation": "1",
        "kubernetes.io/cluster-service": "true",
        "k8s-app": "fluent-bit-logging",
        "controller-revision-hash": "2050036262"
      },
      "pod_id": "af021eb0-c6ef-11e7-ada5-080027039766",
      "namespace_name": "logging",
      "pod_name": "fluent-bit-qpz69"
    },
    "time": "2017-11-11T15:00:40.427239619Z",
    "stream": "stderr",
    "log": "[2017/11/11 15:00:40] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/bootstrap]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected\\n",
    "@timestamp": 1510412440.42724
  }

Notably, compared to filebeat 6.0.0-rc (Yolean/kubernetes-kafka#88), it contains node name and annotations. Also notably, like filebeat, no key is set.

with Yolean/kubernetes-kafka brokers.

but beware of the aggregation recursion with the test, you'll see escaped escaped ... escaped json.

edsiper · 2017-11-11T15:26:35Z

Note: there is a Message_Key optional config parameter available

…

On Nov 11, 2017 09:06, "solsson" ***@***.***> wrote: Based on fluent/fluent-bit#94 <fluent/fluent-bit#94>, currently the dev build from fluent/fluent-bit#94 (comment) <fluent/fluent-bit#94 (comment)> A typical record (pretty-printed using jq): { "kubernetes": { "docker_id": "dd50658b5d1110070dfa24511b7c2e6095bc210b7807617bc9c7e9705c64be28", "container_name": "fluent-bit", "host": "minikube", "annotations": { "kubernetes.io/created-by": "{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"DaemonSet\\\",\\\"namespace\\\":\\\"logging\\\",\\\"name\\\":\\\"fluent-bit\\\",\\\"uid\\\":\\\"7b5e7b6d-c6ef-11e7-ada5-080027039766\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1695\\\"}}\\n" }, "labels": { "version": "v1", "pod-template-generation": "1", "kubernetes.io/cluster-service": "true", "k8s-app": "fluent-bit-logging", "controller-revision-hash": "2050036262" }, "pod_id": "af021eb0-c6ef-11e7-ada5-080027039766", "namespace_name": "logging", "pod_name": "fluent-bit-qpz69" }, "time": "2017-11-11T15:00:40.427239619Z", "stream": "stderr", "log": "[2017/11/11 15:00:40] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/bootstrap]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected\\n", ***@***.***": 1510412440.42724 } Notably, compared to filebeat 6.0.0-rc (Yolean/kubernetes-kafka#88 <Yolean/kubernetes-kafka#88>), it contains node name and annotations. Also notably, like filebeat, no key is set. ------------------------------ You can view, comment on, or merge this pull request online at: #11 Commit Summary - Copies output/elasticsearch/ - Configures current dev build of out_kafka, File Changes - *A* output/kafka/fluent-bit-configmap.yaml <https://github.com/fluent/fluent-bit-kubernetes-logging/pull/11/files#diff-0> (93) - *A* output/kafka/fluent-bit-ds-minikube.yaml <https://github.com/fluent/fluent-bit-kubernetes-logging/pull/11/files#diff-1> (50) - *A* output/kafka/fluent-bit-ds.yaml <https://github.com/fluent/fluent-bit-kubernetes-logging/pull/11/files#diff-2> (44) Patch Links: - https://github.com/fluent/fluent-bit-kubernetes-logging/pull/11.patch - https://github.com/fluent/fluent-bit-kubernetes-logging/pull/11.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#11>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAWkNtINQLbvtcT5TxNm--GKVlVkSQmdks5s1bfdgaJpZM4QahKX> .

solsson · 2017-11-11T15:30:34Z

Note: there is a Message_Key optional config parameter available

How can I use record fields with that option, for example namespace_name + _ + pod_name?

edsiper · 2017-11-11T15:35:30Z

@solsson that feature is not yet available, meaning the "ability to compose a value based on record keys"

WIP in Yolean/kubernetes-kafka#101

StevenACoffman · 2017-12-15T15:21:47Z

output/kafka/fluent-bit-configmap.yaml

+        Match          *
+        Brokers        bootstrap.kafka:9092
+        Topics         ops.kube-logs-fluentbit.stream.json.001
+        Timestamp_Key  @timestamp


There is an option called Retry_Limit set to False, that means if Fluent Bit cannot flush the records to Kafka it will re-try indefinitely until it succeeds.

What are your thoughts?

I was unaware of this but now I found fluent/fluent-bit#309 (comment). I guess the daemonset pods need some monitoring, like a gauge for the number of messages in buffer.

If one message fails it's quite unlikely that the next will be sent, so I think Retry_Limit=False makes sense to preserve ordering, in combination with a very narrow memory limit. It's a decent alternative to buffer monitoring, because if the pod hits the memory limit it will be replaced and generic monitoring on readiness (for example using prometheus-operator and this example) will alert about the issue.

StevenACoffman · 2017-12-15T18:06:36Z

Hrm... @solsson with your recent changes to use the bootstrap service (and @edsiper adding the Retry_Limit False ) I'm getting occasional messages:

[2017/12/15 17:15:40] [ warn] [filter_kube] could not pack merged json
[2017/12/15 17:17:46] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
[2017/12/15 17:17:46] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected

I'm going to look into it a bit more with debugging on.

solsson · 2017-12-15T18:16:38Z

@StevenACoffman That's quite odd. Unless I'm mistaken Kafka producers will only use the boostrap servers string for the initial connect to brokers, then immediately get proper advertised addresses. I'm prepared to revert that change, but I would like to find the cause first.

StevenACoffman · 2017-12-15T19:10:34Z

@edsiper I'm not sure the Retry_Limit logic is working.

Just for testing, I locally reverted bootstrap service change in this pull request, but left my addition of the Retry_Limit false in (and log level set to debug, changed topic to k8s-firehose), and then I deleted the k8s-firehose topic. I then get a bunch of messages like:

logging/fluent-bit-8p6r6[fluent-bit]: % Failed to produce to topic k8s-firehose: Local: Unknown topic

After I recreate the topic in kafka (I have autorecreate disabled), the pods continue to complain indefinitely until I manually restart them. Then they behave again.

By the way, I'm using kail to follow this, and it's pretty great.
kail --ds=fluent-bit will watch the parallel streams in my cluster.

solsson · 2017-12-22T19:03:23Z

I've been running in a QA cluster for a couple of days with Retry_Limit false, without memory limits. Haven't had time to analyze results, but memory usage is suspiciously high on two pods now (45 and 60 MB while the rest are below 20 MB). I also have lots of logs like:

[2017/12/22 18:15:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
[2017/12/22 18:15:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
[2017/12/22 18:15:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2017/12/22 18:15:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2017/12/22 18:25:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2017/12/22 18:25:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2017/12/22 18:35:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2017/12/22 18:35:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2017/12/22 18:35:19] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2017/12/22 18:35:19] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2017/12/22 18:45:19] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2017/12/22 18:45:19] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2017/12/22 18:55:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2017/12/22 18:55:18] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2017/12/22 18:55:19] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
[2017/12/22 18:55:19] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected

I haven't seen any indication that our other Kafka clients have this much issues.

edsiper · 2017-12-22T19:07:55Z

hmm those messages are generated by librdkafka, now using 0.11.1, I will upgrade to 0.11.3 (https://github.com/edenhill/librdkafka/releases/tag/v0.11.3)

solsson · 2017-12-22T19:11:46Z

@edsiper Is there any way for me to pass debug flags to librdkafka? With kafkacat it's very useful to add -d broker,topic if problems like these occur.

edsiper · 2017-12-22T19:28:02Z

rdkafka updated on image fluent/fluent-bit-kafka-dev:0.5

StevenACoffman · 2017-12-22T22:21:01Z

@edsiper @solsson I updated the Yolean/kubernetes-kafka:single-node branch and dumped the files from here in this branch of this repo from 7945b66 together into my own StevenACoffman/kubernetes-kafka#2

This lets me run in minikube a single node kafka cluster, with a fluent-bit daemonset for testing purposes. Hopefully helpful to one of you as well!

Unfortunately, I'm out for the rest of the year, so I'll not be able to give this more attention until Jan 2, 2018.

solsson · 2017-12-23T05:37:21Z

It's worth noting about the QA cluster I've been running on that it's multi-zone (Yolean/kubernetes-kafka#41). With log aggregation being a per-node affair, I wonder if kafka clients too can be "rack-aware". If I set acks to 1 it should be possible to produce to the broker in the same availability zone, if it's up and has the required partitions, and let kafka do the replication out. Now we will sometimes write to a broker that is close, sometimes to one that is farther away which will then replicate back to the closer broker.

but beware of the aggregation recursion with the test, you'll see escaped escaped ... escaped json.

solsson · 2018-01-16T19:08:06Z

#16 replaces this

solsson added 2 commits November 11, 2017 15:40

Copies output/elasticsearch/

732d848

Configures current dev build of out_kafka,

ecd371b

with Yolean/kubernetes-kafka brokers.

solsson added a commit to Yolean/kubernetes-kafka that referenced this pull request Nov 11, 2017

Adds topic and test for fluent/fluent-bit-kubernetes-logging#11

fb87740

but beware of the aggregation recursion with the test, you'll see escaped escaped ... escaped json.

solsson mentioned this pull request Nov 11, 2017

Add out_kafka output fluent/fluent-bit#94

Closed

StevenACoffman mentioned this pull request Nov 12, 2017

Re-write plugin to be compatible with current version of fluent-bit daemonset samsung-cnct/fluent-bit-kafka-output-plugin#8

Closed

solsson added 5 commits November 13, 2017 14:04

Bumps to latest build, tag 0.2

9b62cea

Upgrades to :0.3 but keeps unencoded format

1403cc9

New build from fluent/fluent-bit#94

b988dd0

Elaborates on a naming convention

e045ef1

WIP in Yolean/kubernetes-kafka#101

Uses the bootstrap service, to avoid error messages in smaller clusters

15d82a5

solsson force-pushed the out-kafka branch from 3dd7d99 to 15d82a5 Compare December 14, 2017 06:30

StevenACoffman reviewed Dec 15, 2017

View reviewed changes

solsson mentioned this pull request Dec 18, 2017

Kafkacat doesn't work with bootstrap service Yolean/kubernetes-kafka#110

Closed

Recommends that we retry sending, forever - TODO: memory limits

8765262

Upgrades to latest fluent-bit-kafka-dev image, librdkafka 0.11.3

7945b66

solsson mentioned this pull request Jan 8, 2018

Minor cleanups Yolean/kubernetes-kafka#115

Merged

solsson added a commit to Yolean/kubernetes-kafka that referenced this pull request Jan 8, 2018

Adds topic and test for fluent/fluent-bit-kubernetes-logging#11

677fcc8

but beware of the aggregation recursion with the test, you'll see escaped escaped ... escaped json.

solsson added a commit to Yolean/kubernetes-kafka that referenced this pull request Jan 8, 2018

Adds topic and test for fluent/fluent-bit-kubernetes-logging#11

5ec8709

but beware of the aggregation recursion with the test, you'll see escaped escaped ... escaped json.

solsson mentioned this pull request Jan 16, 2018

Add out_kafka example #16

Merged

solsson closed this Jan 16, 2018

solsson mentioned this pull request Apr 23, 2018

How to use out_kafka Message_Key for partitioning fluent/fluent-bit#570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add out_kafka daemonset #11

Add out_kafka daemonset #11

solsson commented Nov 11, 2017

edsiper commented Nov 11, 2017 via email

solsson commented Nov 11, 2017

edsiper commented Nov 11, 2017

StevenACoffman Dec 15, 2017 •

edited

Loading

solsson Dec 15, 2017

StevenACoffman commented Dec 15, 2017 •

edited

Loading

solsson commented Dec 15, 2017

StevenACoffman commented Dec 15, 2017

solsson commented Dec 22, 2017

edsiper commented Dec 22, 2017

solsson commented Dec 22, 2017

edsiper commented Dec 22, 2017

StevenACoffman commented Dec 22, 2017 •

edited

Loading

solsson commented Dec 23, 2017

solsson commented Jan 16, 2018

Add out_kafka daemonset #11

Add out_kafka daemonset #11

Conversation

solsson commented Nov 11, 2017

edsiper commented Nov 11, 2017 via email

solsson commented Nov 11, 2017

edsiper commented Nov 11, 2017

StevenACoffman Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

solsson Dec 15, 2017

Choose a reason for hiding this comment

StevenACoffman commented Dec 15, 2017 • edited Loading

solsson commented Dec 15, 2017

StevenACoffman commented Dec 15, 2017

solsson commented Dec 22, 2017

edsiper commented Dec 22, 2017

solsson commented Dec 22, 2017

edsiper commented Dec 22, 2017

StevenACoffman commented Dec 22, 2017 • edited Loading

solsson commented Dec 23, 2017

solsson commented Jan 16, 2018

StevenACoffman Dec 15, 2017 •

edited

Loading

StevenACoffman commented Dec 15, 2017 •

edited

Loading

StevenACoffman commented Dec 22, 2017 •

edited

Loading