Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdb/ztest: improve and harmonise crash output #16181

Closed
wants to merge 6 commits into from

Conversation

robn
Copy link
Member

@robn robn commented May 10, 2024

Motivation and Context

The ongoing quest for better insight when things go wrong. And tidying up.

Description

See the commit messages. tl;dr:

  • make zdb produce a backtrace on SIGSEGV/SIGABRT, just like ztest
  • use the nicer libunwind backtraces from libspl: nice assertion output #16140 in this case (adding FreeBSD support)
  • make the backtrace and dbgmsg printers (more) signal-safe, by removing allocations and printfs

This started out as just bringing over the SIGSEGV/SIGABRT handling from ztest to zdb, which is very useful if zdb crashes out on some kind of weird thing on disk. But then I saw I could have better backtraces, so I did that. Then I realised the backtrace functions needed to be a bit more general (where to send output), and they need to be safe to call from a signal handler. And the rest is just tidying up. Lots more tidying possible, but that's enough here.

There is a user-facing change here: dbgmsg output from ztest and zdb will now go to stderr, not stdout. It's kind of an incidental change, but I consider it an improvement.

How Has This Been Tested?

By hand, on Linux glibc with/without libunwind, FreeBSD 14 with libunwind, and FreeBSD 13 without libunwind. All looks good.

I'll be curious to see what the test suite makes of it, given zdb is important to some parts of it. I'm not expecting anything though.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn robn force-pushed the signal-backtraces branch 4 times, most recently from 482afce to 7679e32 Compare May 10, 2024 12:08
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 10, 2024
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 13, 2024
robn added 6 commits May 14, 2024 09:36
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
@robn robn force-pushed the signal-backtraces branch from 7679e32 to 96b60cb Compare May 14, 2024 03:38
behlendorf pushed a commit that referenced this pull request May 14, 2024
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16181
behlendorf pushed a commit that referenced this pull request May 14, 2024
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16181
behlendorf pushed a commit that referenced this pull request May 14, 2024
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16181
behlendorf pushed a commit that referenced this pull request May 14, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16181
behlendorf pushed a commit that referenced this pull request May 14, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16181
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 91c46d4)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit e7b4519)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 3974ef0)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 1ea8c59)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit fa99d9c)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 3c941d1)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 91c46d4)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit e7b4519)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 3974ef0)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 1ea8c59)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit fa99d9c)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 3c941d1)
@robn robn mentioned this pull request Jul 17, 2024
13 tasks
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 91c46d4)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit e7b4519)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 3974ef0)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 1ea8c59)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit fa99d9c)
robn added a commit to robn/zfs that referenced this pull request Jul 17, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
(cherry picked from commit 3c941d1)
robn added a commit to robn/zfs that referenced this pull request Jul 18, 2024
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
ptr1337 pushed a commit to CachyOS/zfs that referenced this pull request Nov 14, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
ptr1337 pushed a commit to CachyOS/zfs that referenced this pull request Nov 14, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
ptr1337 pushed a commit to CachyOS/zfs that referenced this pull request Nov 21, 2024
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
ptr1337 pushed a commit to CachyOS/zfs that referenced this pull request Nov 21, 2024
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants