Program auto exits after a few seconds #33

Du-Li · 2023-03-21T04:44:31Z

Describe the bug
About 5 seconds after starting go-kafka-connect-couchbase connector, the health check failed and the connector auto exited.

To Reproduce
Steps to reproduce the behavior:

extend the example main.go to print the received cb-dcp events
copy the example main.go and config.yml to an ubuntu box
run the code like go run main.go
See error

{"level":"debug","time":"2023-03-21T04:39:33Z","message":"vbucket discovery opened with membership type: static"}
{"level":"info","time":"2023-03-21T04:39:33Z","message":"member: 1/1, vbucket range: 0-1023"}
{"level":"debug","time":"2023-03-21T04:39:33Z","message":"loaded checkpoint"}
{"level":"debug","time":"2023-03-21T04:39:33Z","message":"stream started"}
{"level":"debug","time":"2023-03-21T04:39:33Z","message":"started checkpoint schedule"}
{"level":"info","time":"2023-03-21T04:39:33Z","message":"dcp stream started"}
{"level":"info","time":"2023-03-21T04:39:33Z","message":"metric middleware registered on path /metrics"}
{"level":"info","time":"2023-03-21T04:39:33Z","message":"api starting on port 8080"}
{"level":"debug","time":"2023-03-21T04:39:43Z","message":"no need to save checkpoint"}
{"level":"error","error":"context deadline exceeded","time":"2023-03-21T04:39:48Z","message":"health check failed"}
{"level":"debug","time":"2023-03-21T04:39:48Z","message":"vbucket discovery closed"}
{"level":"debug","time":"2023-03-21T04:39:48Z","message":"no need to save checkpoint"}
{"level":"debug","time":"2023-03-21T04:39:48Z","message":"stopped checkpoint schedule"}
{"level":"debug","time":"2023-03-21T04:39:49Z","message":"stream stopped"}
{"level":"debug","time":"2023-03-21T04:39:49Z","message":"api stopped"}
{"level":"debug","time":"2023-03-21T04:39:49Z","message":"dcp connection closed
{"level":"debug","time":"2023-03-21T04:39:49Z","message":"connections closed
{"level":"info","time":"2023-03-21T04:39:49Z","message":"dcp stream closed"}

Expected behavior
It's expected to continuously print the cb dcp events as long as a workload is fed into cb.

Screenshots
N/A

Version (please complete the following information):

OS: Linux ubuntu-box 5.4.231-137.341.amzn2.x86_64
Golang version: go1.18.1 linux/amd64
Couchbase: Community Edition 6.5.0 build 4966
Kafka: 3.3.1

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

mhmtszr · 2023-03-21T05:10:32Z

Hello @Du-Li, it seems there was a dcp checkpoint timeout, could you please try to increase the checkpoint timeout? I will share an example config. Also, which version of go-kafka-connect-couchbase have you used?

dcp:
  ...
checkpoint:
  timeout: 100s
  type: manual

erayarslan · 2023-03-21T08:55:40Z

Hi @Du-Li did you see any log like connected to ..., bucket: ..., meta bucket: ... and can you sure your couchbase still up?
thank you.

Du-Li · 2023-03-21T14:04:06Z

@erayarslan Thanks for your reply. Yes my CB is always up and those messages were rightly printed.

My go-kafka-connect-cb was 0.0.20 and go-dcp-client was 0.0.22. I added the suggested type: manual but the program still exited by itself albeit after 15 seconds.

Du-Li · 2023-03-21T14:21:35Z

It looks that 15 seconds are the default checkpoint interval 10s plus timeout 5s, so the configured checkpoint.timeout: 100s and checkpoint.type: manual didn't work. I need to be able to control the exit of program before seriously trying out other things.

erayarslan · 2023-03-21T14:23:56Z

@Du-Li its not about checkpoint. i think its about our default healthCheck timeout which is 5 seconds. you can check it from here
can you increase and retry?
for example

healthCheck:
  timeout: 30s

thank you for feedbacks.

Du-Li · 2023-03-21T14:34:11Z

Thanks @erayarslan.
Yes the program ran a bit longer by the extended healthCheck.timeout. But why do we want the health check fail if nothing goes wrong? Is there a way that I can control the exit of the program? Ideally it should just run even if there are no dcp events.

erayarslan · 2023-03-21T14:41:52Z

health check periodically sending ping request to couchbase to ensure its alive.
if ping gonna fail or timeout, we are shutting down everything.
logic is here
somehow your cluster didnt response client's ping request.
any additional configuration about your cluster can you provide to me? so i can try to reproduce on that environment.

Du-Li · 2023-03-21T14:50:55Z

My CB cluster is a local setup from community version 6.5. It worked fine with Kafka Connect and the CB connector. I am not sure why the dcp client's ping request failed.

Du-Li · 2023-03-21T15:23:11Z

The program exited by the healthCheck interval+timeout despite a workload continuously updating CB and the program itself printing the right dcp events.

erayarslan · 2023-03-21T15:26:06Z

i think i understand the issue. your cluster running on aws right? when we send ping request, we are controlling all kind of couchbase service like memd, mgmt, n1ql, fts, capi ... if your network policy not allowing one of these, ping gonna fail probably.

Du-Li · 2023-03-21T15:29:16Z

I see. I am running the connector and kafka in EKS while the CB cluster out EKS. What kind of network policy is needed to make ping work?

erayarslan · 2023-03-21T15:32:23Z

i dont think we should check them all because i think u have right policy to listen dcp.
we need ignore unnecessary services for ping. let me change this after that we can retry.

Du-Li · 2023-03-21T15:33:58Z

yep I can't agree more. as long as it is able to receive dcp events, it should be considered healthy. no need for comprehensive checking.

erayarslan · 2023-03-21T16:55:08Z

v0.0.22 ping only needs memd (default port: 11210) and mnmt (default port: 8091) services.
if your network policy allowing these ports, problem should be solved. can you retry?

Du-Li · 2023-03-21T18:38:36Z

@erayarslan health check still failed after interval+timeout time. From the same machine the program was run, I tried telnet on the two ports to one of the CB servers and they worked.

{"level":"info","time":"2023-03-21T18:35:41Z","message":"api starting on port 8080"}
{"level":"error","error":"context deadline exceeded","time":"2023-03-21T18:35:56Z","message":"health check failed"}

erayarslan · 2023-03-21T18:58:18Z

are you sure your version upgraded to go-kafka-connect-couchbase v0.0.22 and go-dcp-client v0.0.24?
cause i reproduce this issue on my vm with firewall configurations. and after new version its solved.

Du-Li · 2023-03-21T19:06:56Z

yes I am very sure. following is part of my go.mod file.

`
require (
github.com/Trendyol/go-kafka-connect-couchbase v0.0.22
github.com/gookit/config/v2 v2.2.1
)

require (
github.com/Trendyol/go-dcp-client v0.0.24 // indirect
`

erayarslan · 2023-03-22T06:38:10Z

i cannot reproduce your behaviour so i expose healthcheck disable/enable config. with v0.0.23 u can disable healthcheck like

healthCheck:
  enabled: false

Du-Li · 2023-03-22T15:02:55Z

@erayarslan Thanks. That worked.

erayarslan · 2023-03-22T15:04:03Z

thank you so much for contribution 🥳

erayarslan added a commit that referenced this issue Mar 21, 2023

fix: bump go-dcp-client v0.0.24 #33

e90a816

erayarslan closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Program auto exits after a few seconds #33

Program auto exits after a few seconds #33

Du-Li commented Mar 21, 2023

mhmtszr commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

Du-Li commented Mar 21, 2023 •

edited

Loading

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023 •

edited

Loading

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 22, 2023

Du-Li commented Mar 22, 2023

erayarslan commented Mar 22, 2023

Program auto exits after a few seconds #33

Program auto exits after a few seconds #33

Comments

Du-Li commented Mar 21, 2023

mhmtszr commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

Du-Li commented Mar 21, 2023 • edited Loading

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023 • edited Loading

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 21, 2023

Du-Li commented Mar 21, 2023

erayarslan commented Mar 22, 2023

Du-Li commented Mar 22, 2023

erayarslan commented Mar 22, 2023

Du-Li commented Mar 21, 2023 •

edited

Loading

Du-Li commented Mar 21, 2023 •

edited

Loading