program panic when failed to initialize etcd store is unreasonable #1684

tokers · 2021-03-28T02:24:57Z

Issue description

Say my etcd cluster contains some corrupted data, which causes the manager api launching failed, the manager-api just crashes and the output is not so friendly.

Expected behavior

Output some messages to hint at this failure and exit.

Screenshots

Environment

apisix version (cmd: apisix version):
OS (cmd: uname -a):
OpenResty / Nginx version (cmd: nginx -V or openresty -V):
etcd version, if have (cmd: run curl http://127.0.0.1:9090/v1/server_info to get the info from server-info API): 3.5.0-alpha0
apisix-dashboard version, if have: 2.5
Browser version, if have:

Additional context

The text was updated successfully, but these errors were encountered:

batman-ezio · 2021-03-28T23:41:44Z

that also happens when ectd is unable to connect

tokers · 2021-03-29T01:09:30Z

that also happens when ectd is unable to connect

Yep, let it crash is not a good way here.

nic-chen · 2021-03-30T05:21:10Z

thanks for feedback

bisakhmondal · 2021-04-01T21:31:57Z

Hii!! everyone, can I work on this issue?
I have been able to reproduce the same error. After looking into it, I found that the error is thrown during store initialization & caching.

So what should be the ideal behaviour for manager-api if such a case happens?
Just put a descriptive log, ignore the current entry and continue processing the next one. How does it sound to you? Let me know what you think. Thanks :)

tokers · 2021-04-02T00:43:04Z

@bisakhmondal Sure, assigned to you.

nic-chen · 2021-04-02T01:56:30Z

hi @bisakhmondal
Here is the expected behavior mentioned in issue content:

Output some messages to hint at this failure and exit.

bisakhmondal · 2021-04-02T06:00:41Z

How is it now?

tokers · 2021-04-03T03:15:24Z

@bisakhmondal We don't have to panic the program, instead, we may report the error reason and exit with a non-zero code.

bisakhmondal · 2021-04-03T10:12:22Z

@bisakhmondal We don't have to panic the program, instead, we may report the error reason and exit with a non-zero code.

Okay, we can definitely go with it. I would like to mention an issue in this approach. We are keeping a slice of closers [ ref ] for all the allocated resources (including etcd connection), so in case of any error, for a graceful shutdown, the already allocated resource's closer method should be called.

os.Exit(1) will immediately abort the program. But that is not the case for panic. Even after panic the early evaluated defers will get executed in LIFO order. So I have put utils.CloseAll() into a defer statement before the scope of any panics.

IMHO, panic is fine here. Let me know what you think. Thanks :)

tokers · 2021-04-03T11:23:33Z

@bisakhmondal We don't have to panic the program, instead, we may report the error reason and exit with a non-zero code.

Okay, we can definitely go with it. I would like to mention an issue in this approach. We are keeping a slice of closers [ ref ] for all the allocated resources (including etcd connection), so in case of any error, for a graceful shutdown, the already allocated resource's closer method should be called.

os.Exit(1) will immediately abort the program. But that is not the case for panic. Even after panic the early evaluated defers will get executed in LIFO order. So I have put utils.CloseAll() into a defer statement before the scope of any panics.

IMHO, panic is fine here. Let me know what you think. Thanks :)

Yes, I agree that all finalizers or closers should be run even if exceptions occur, the reason why I think the spontaneous panic is not suitable is this is a clear and specific exception, not a programming fault.

bisakhmondal · 2021-04-03T13:53:30Z

Very true. Pushing the new changes then :)

tokers added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Mar 28, 2021

juzhiyuan added the backend label Mar 28, 2021

nic-chen added this to the 2.6 milestone Mar 30, 2021

tokers assigned bisakhmondal Apr 2, 2021

bisakhmondal mentioned this issue Apr 2, 2021

fix: detailed error for store init failure and deferred execution of closers #1705

Closed

8 tasks

nic-chen modified the milestones: 2.6, 2.7 Apr 19, 2021

bisakhmondal mentioned this issue Apr 21, 2021

fix: efficient error handling in manager-api including graceful shutdown, self contained methods. #1814

Merged

8 tasks

tokers closed this as completed in #1814 May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

program panic when failed to initialize etcd store is unreasonable #1684

program panic when failed to initialize etcd store is unreasonable #1684

tokers commented Mar 28, 2021

batman-ezio commented Mar 28, 2021

tokers commented Mar 29, 2021

nic-chen commented Mar 30, 2021

bisakhmondal commented Apr 1, 2021

tokers commented Apr 2, 2021

nic-chen commented Apr 2, 2021

bisakhmondal commented Apr 2, 2021 •

edited

Loading

tokers commented Apr 3, 2021 •

edited

Loading

bisakhmondal commented Apr 3, 2021

tokers commented Apr 3, 2021

bisakhmondal commented Apr 3, 2021

program panic when failed to initialize etcd store is unreasonable #1684

program panic when failed to initialize etcd store is unreasonable #1684

Comments

tokers commented Mar 28, 2021

Issue description

Expected behavior

Screenshots

Environment

Additional context

batman-ezio commented Mar 28, 2021

tokers commented Mar 29, 2021

nic-chen commented Mar 30, 2021

bisakhmondal commented Apr 1, 2021

tokers commented Apr 2, 2021

nic-chen commented Apr 2, 2021

bisakhmondal commented Apr 2, 2021 • edited Loading

tokers commented Apr 3, 2021 • edited Loading

bisakhmondal commented Apr 3, 2021

tokers commented Apr 3, 2021

bisakhmondal commented Apr 3, 2021

bisakhmondal commented Apr 2, 2021 •

edited

Loading

tokers commented Apr 3, 2021 •

edited

Loading