Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

program panic when failed to initialize etcd store is unreasonable #1684

Closed
tokers opened this issue Mar 28, 2021 · 11 comments · Fixed by #1814
Closed

program panic when failed to initialize etcd store is unreasonable #1684

tokers opened this issue Mar 28, 2021 · 11 comments · Fixed by #1814
Assignees
Labels
backend enhancement New feature or request
Milestone

Comments

@tokers
Copy link
Contributor

tokers commented Mar 28, 2021

Issue description

Say my etcd cluster contains some corrupted data, which causes the manager api launching failed, the manager-api just crashes and the output is not so friendly.

Expected behavior

Output some messages to hint at this failure and exit.

Screenshots

image

Environment

  • apisix version (cmd: apisix version):
  • OS (cmd: uname -a):
  • OpenResty / Nginx version (cmd: nginx -V or openresty -V):
  • etcd version, if have (cmd: run curl http://127.0.0.1:9090/v1/server_info to get the info from server-info API): 3.5.0-alpha0
  • apisix-dashboard version, if have: 2.5
  • Browser version, if have:

Additional context

@tokers tokers added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Mar 28, 2021
@batman-ezio
Copy link
Contributor

that also happens when ectd is unable to connect

@tokers
Copy link
Contributor Author

tokers commented Mar 29, 2021

that also happens when ectd is unable to connect

Yep, let it crash is not a good way here.

@nic-chen
Copy link
Member

thanks for feedback

@nic-chen nic-chen added this to the 2.6 milestone Mar 30, 2021
@bisakhmondal
Copy link
Member

Hii!! everyone, can I work on this issue?
I have been able to reproduce the same error. After looking into it, I found that the error is thrown during store initialization & caching.

So what should be the ideal behaviour for manager-api if such a case happens?
Just put a descriptive log, ignore the current entry and continue processing the next one. How does it sound to you? Let me know what you think. Thanks :)

@tokers
Copy link
Contributor Author

tokers commented Apr 2, 2021

@bisakhmondal Sure, assigned to you.

@nic-chen
Copy link
Member

nic-chen commented Apr 2, 2021

hi @bisakhmondal
Here is the expected behavior mentioned in issue content:

Output some messages to hint at this failure and exit.

@bisakhmondal
Copy link
Member

bisakhmondal commented Apr 2, 2021

image

How is it now?

@tokers
Copy link
Contributor Author

tokers commented Apr 3, 2021

@bisakhmondal We don't have to panic the program, instead, we may report the error reason and exit with a non-zero code.

@bisakhmondal
Copy link
Member

@bisakhmondal We don't have to panic the program, instead, we may report the error reason and exit with a non-zero code.

Okay, we can definitely go with it. I would like to mention an issue in this approach. We are keeping a slice of closers [ ref ] for all the allocated resources (including etcd connection), so in case of any error, for a graceful shutdown, the already allocated resource's closer method should be called.

os.Exit(1) will immediately abort the program. But that is not the case for panic. Even after panic the early evaluated defers will get executed in LIFO order. So I have put utils.CloseAll() into a defer statement before the scope of any panics.

IMHO, panic is fine here. Let me know what you think. Thanks :)

@tokers
Copy link
Contributor Author

tokers commented Apr 3, 2021

@bisakhmondal We don't have to panic the program, instead, we may report the error reason and exit with a non-zero code.

Okay, we can definitely go with it. I would like to mention an issue in this approach. We are keeping a slice of closers [ ref ] for all the allocated resources (including etcd connection), so in case of any error, for a graceful shutdown, the already allocated resource's closer method should be called.

os.Exit(1) will immediately abort the program. But that is not the case for panic. Even after panic the early evaluated defers will get executed in LIFO order. So I have put utils.CloseAll() into a defer statement before the scope of any panics.

IMHO, panic is fine here. Let me know what you think. Thanks :)

Yes, I agree that all finalizers or closers should be run even if exceptions occur, the reason why I think the spontaneous panic is not suitable is this is a clear and specific exception, not a programming fault.

@bisakhmondal
Copy link
Member

Very true. Pushing the new changes then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment