Skip to content
This repository has been archived by the owner on Dec 1, 2020. It is now read-only.

Double check plugin is compatible with current state of Fluent-bit Daemonset #5

Closed
leahnp opened this issue Aug 1, 2017 · 7 comments

Comments

@leahnp
Copy link
Contributor

leahnp commented Aug 1, 2017

Check current state of out_kafka plugin
check issues, are any breaking or p-0’s?
Make sure none of the recent changes to the fluent-bit daemonset send an unsupported encoded data.


Blocked by: https://github.com/samsung-cnct/k2-logging-fluent-bit-daemonset/issues/10

@guineveresaenger
Copy link
Contributor

After fixing an upstream error by updating the version (samsung-cnct/kraken-logging-fluent-bit-daemonset#19) logs are getting printed to stdout. If we change the output to kafka plugin, the following error occurs on the Pods:

[2017/09/19 16:13:10] [ info] [engine] started
Failed to start Sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
[2017/09/19 16:13:11] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443
[2017/09/19 16:13:11] [ info] [filter_kube] local POD info OK
[2017/09/19 16:13:11] [ info] [filter_kube] testing connectivity with API server...
[2017/09/19 16:13:11] [ info] [filter_kube] API server connectivity OK
panic: interface conversion: interface is codec.RawExt, not uint64

goroutine 17 [running, locked to thread]:
panic(0x7f0239366ba0, 0xc820076500)
	/usr/lib/go-1.6/src/runtime/panic.go:481 +0x3ea
main.encode_as_json(0x7f023925e560, 0xc8201327e0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/fluent-bit-kafka-output-plugin/out_kafka.go:119 +0x120
main.FLBPluginFlush(0x7f0234c40010, 0xc8001f4256, 0x1e5e080, 0x7f0238b22f40)
	/fluent-bit-kafka-output-plugin/out_kafka.go:64 +0x3ac
main._cgoexpwrap_0a4fe733c09b_FLBPluginFlush(0x7f0234c40010, 0x61647075001f4256, 0x1e5e080, 0x656e69225c3d796c)
	command-line-arguments/_obj/_cgo_gotypes.go:89 +0x35

@guineveresaenger
Copy link
Contributor

Having spent a couple days digging into this, I think I have identified a few problems.

  1. Using the daemonset with output set to out_kafka, there is a goroutine error:
Failed to start Sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
panic: interface conversion: interface is codec.RawExt, not uint64

goroutine 17 [running, locked to thread]:
panic(0x7f0ee15d7ba0, 0xc820066840)
	/usr/lib/go-1.6/src/runtime/panic.go:481 +0x3ea
main.encode_as_json(0x7f0ee14cf560, 0xc82013ea60, 0x0, 0x0, 0x0, 0x0, 0x0)
	/fluent-bit-kafka-output-plugin/out_kafka.go:119 +0x120
main.FLBPluginFlush(0x7f0ed8e44010, 0xc80017d7b5, 0x1b54960, 0x7f0ee0d93f40)
	/fluent-bit-kafka-output-plugin/out_kafka.go:64 +0x3ac
main._cgoexpwrap_0a4fe733c09b_FLBPluginFlush(0x7f0ed8e44010, 0x30755c5a0017d7b5, 0x1b54960, 0x5f726f7461727473)
	command-line-arguments/_obj/_cgo_gotypes.go:89 +0x35```

It seems as though there is a golang error in the output plugin that should be fixed.

This error appears regardless of whether kafka is deployed as a service on the cluster or not. Which leads me to believe that:

2. It also appears as though the kafka service is not fully operating, even when also making a central-logging-fluentd deployment on the same cluster. There is a "pending" pod called `kafka-0` and events show a scheduling error:
```Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  4m		11s		20	default-scheduler			Warning		FailedScheduling	PersistentVolumeClaim is not bound: "datadir-kafka-0" (repeated 15 times)```

I have been looking into that error, and it appears it is related to needing a matching Volume somewhere on the cluster, and kafka fails to find that. I tried deploying a logging-central-fluentd Deployment as given [here](https://github.com/samsung-cnct/k2-logging-central-fluentd) to no effect.

It is possible that I need to check if I am using the correct container images, which will be my next step.

@guineveresaenger
Copy link
Contributor

status report: Hunting down zookeeper chart and found helm bug on versions < 1.7
Nodes were too small for kafka pods to run so needing to upgrade the cluster to bigger worker nodes.
Currently working on reliably spinning up kafka pods and getting fluentbit-kafka-plugin to work.

@guineveresaenger
Copy link
Contributor

Status report:
Per inquiry on fluent slack, the basic plugin template from here only supports fluent-bit v0.11.x. It does not support v 0.12.x, which makes sense given the golang error that persists above.
We want to be able to use the systemd plugin, which is new with v0.12.x.
Mocking a local dev environment has proven tricky, since both the tail and systemd plugins only work on Linux. Eduardo from fluent-bit was both apologetic and helpful in suggesting a workaround to mock systemd data on my machine.
Conclusion:
At this point, this plugin doesn't seem compatible with our current fluent-bit daemonset (both on this repo and on the new chart repo
One solution would be to rewrite the golang in the output plugin to process the incoming data appropriately.

@guineveresaenger
Copy link
Contributor

Update:
There is currently an open issue on fluent-bit plugin template to support v 0.12.
Results of attempting to use kafka output plugin:

  • Locally on MacOS, using random input plugin and kafka output plugin, kafka server displayed changes properly, whether through a config file or using command line flags.
  • Using our fluent-bit image on the cloud and the exact same configuration file as locally, the process stops with the same golang error as shown above. This happened with any of the random, systemd, or tail plugins. Stdout output would print to terminal just fine.
  • Locally testing systemd and tail plugins turned out to be difficult, as neither plugin supports MacOS. I have not tried running a container locally, since I am not sure why the output plugin works locally with test data from random plugin but not up in the cloud.
    Conclusion:
    As-is, the plugin is not compatible with current state of fluent-bit containers, either in this repo or in the newer, soon to be used container-fluent-bit repo. New Issue here.
    I recommend we wait until we have updated the base image for the fluent-bit containers, which is currently being done. Perhaps by that time the fluent-bit template will have been updated as well.

@guineveresaenger
Copy link
Contributor

Update: the fluent-bit plugin has support for v0.12! Code here.
It is still probably beneficial to ensure a more compatible base image as well.

@coffeepac
Copy link
Contributor

this was a fun research spike of an issue. other issues were created and added the board.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants