Skip to content

Commit

Permalink
Extend zdb to print inconsistencies in livelists and metaslabs
Browse files Browse the repository at this point in the history
Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

External-issue: DLPX-66031
Signed-off-by: Matthew Ahrens <[email protected]>
  • Loading branch information
sara hartse authored and ahrens committed Jul 7, 2020
1 parent a4b0a74 commit 068b924
Show file tree
Hide file tree
Showing 9 changed files with 649 additions and 65 deletions.
654 changes: 598 additions & 56 deletions cmd/zdb/zdb.c

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion cmd/ztest/ztest.c
Original file line number Diff line number Diff line change
Expand Up @@ -6469,7 +6469,7 @@ ztest_run_zdb(char *pool)
ztest_get_zdb_bin(bin, len);

(void) sprintf(zdb,
"%s -bcc%s%s -G -d -Y -e -p %s %s",
"%s -bcc%s%s -G -d -Y -e -y -p %s %s",
bin,
ztest_opts.zo_verbose >= 3 ? "s" : "",
ztest_opts.zo_verbose >= 4 ? "v" : "",
Expand Down
3 changes: 3 additions & 0 deletions include/sys/metaslab.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,9 @@ void metaslab_set_selected_txg(metaslab_t *, uint64_t);

extern int metaslab_debug_load;

range_seg_type_t metaslab_calculate_range_tree_type(vdev_t *vdev,
metaslab_t *msp, uint64_t *start, uint64_t *shift);

#ifdef __cplusplus
}
#endif
Expand Down
9 changes: 9 additions & 0 deletions include/sys/space_map.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,15 @@ typedef struct space_map_entry {
uint32_t sme_vdev; /* max is 2^24-1; SM_NO_VDEVID if not present */
uint64_t sme_offset; /* max is 2^63-1; units of sm_shift */
uint64_t sme_run; /* max is 2^36; units of sm_shift */

/*
* The following fields are not part of the actual space map entry
* on-disk and they are populated with the values from the debug
* entry most recently visited starting from the beginning to the
* end of the space map.
*/
uint64_t sme_txg;
uint64_t sme_sync_pass;
} space_map_entry_t;

#define SM_NO_VDEVID (1 << SPA_VDEVBITS)
Expand Down
10 changes: 8 additions & 2 deletions man/man8/zdb.8
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
.\"
.\"
.\" Copyright 2012, Richard Lowe.
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
.\" Copyright (c) 2012, 2019 by Delphix. All rights reserved.
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Lawrence Livermore National Security, LLC.
.\" Copyright (c) 2017 Intel Corporation.
Expand All @@ -23,7 +23,7 @@
.Nd display zpool debugging and consistency information
.Sh SYNOPSIS
.Nm
.Op Fl AbcdDFGhikLMPsvXY
.Op Fl AbcdDFGhikLMPsvXYy
.Op Fl e Oo Fl V Oc Op Fl p Ar path ...
.Op Fl I Ar inflight I/Os
.Oo Fl o Ar var Ns = Ns Ar value Oc Ns ...
Expand Down Expand Up @@ -403,6 +403,12 @@ but read transactions otherwise deemed too old.
Attempt all possible combinations when reconstructing indirect split blocks.
This flag disables the individual I/O deadman timer in order to allow as
much time as required for the attempted reconstruction.
.It Fl y
Perform validation for livelists that are being deleted.
Scans through the livelist and metaslabs, checking for duplicate entries
and compares the two, checking for potential double frees.
If it encounters issues, warnings will be printed, but the command will not
necessarily fail.
.El
.Pp
Specifying a display option more than once enables verbosity for only that
Expand Down
2 changes: 1 addition & 1 deletion module/zfs/metaslab.c
Original file line number Diff line number Diff line change
Expand Up @@ -2533,7 +2533,7 @@ metaslab_unload(metaslab_t *msp)
* the vdev_ms_shift - the vdev_ashift is less than 32, we can store
* the ranges using two uint32_ts, rather than two uint64_ts.
*/
static range_seg_type_t
range_seg_type_t
metaslab_calculate_range_tree_type(vdev_t *vdev, metaslab_t *msp,
uint64_t *start, uint64_t *shift)
{
Expand Down
28 changes: 26 additions & 2 deletions module/zfs/space_map.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ space_map_iterate(space_map_t *sm, uint64_t end, sm_cb_t callback, void *arg)
ZIO_PRIORITY_SYNC_READ);

int error = 0;
uint64_t txg = 0, sync_pass = 0;
for (uint64_t block_base = 0; block_base < end && error == 0;
block_base += blksz) {
dmu_buf_t *db;
Expand All @@ -117,8 +118,29 @@ space_map_iterate(space_map_t *sm, uint64_t end, sm_cb_t callback, void *arg)
block_cursor < block_end && error == 0; block_cursor++) {
uint64_t e = *block_cursor;

if (sm_entry_is_debug(e)) /* Skip debug entries */
if (sm_entry_is_debug(e)) {
/*
* Debug entries are only needed to record the
* current TXG and sync pass if available.
*
* Note though that sometimes there can be
* debug entries that are used as padding
* at the end of space map blocks in-order
* to not split a double-word entry in the
* middle between two blocks. These entries
* have their TXG field set to 0 and we
* skip them without recording the TXG.
* [see comment in space_map_write_seg()]
*/
uint64_t e_txg = SM_DEBUG_TXG_DECODE(e);
if (e_txg != 0) {
txg = e_txg;
sync_pass = SM_DEBUG_SYNCPASS_DECODE(e);
} else {
ASSERT0(SM_DEBUG_SYNCPASS_DECODE(e));
}
continue;
}

uint64_t raw_offset, raw_run, vdev_id;
maptype_t type;
Expand Down Expand Up @@ -158,7 +180,9 @@ space_map_iterate(space_map_t *sm, uint64_t end, sm_cb_t callback, void *arg)
.sme_type = type,
.sme_vdev = vdev_id,
.sme_offset = entry_offset,
.sme_run = entry_run
.sme_run = entry_run,
.sme_txg = txg,
.sme_sync_pass = sync_pass
};
error = callback(&sme, arg);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ set -A args "create" "add" "destroy" "import fakepool" \
"add raidz1 fakepool" "add raidz2 fakepool" \
"setvprop" "blah blah" "-%" "--?" "-*" "-=" \
"-a" "-f" "-g" "-j" "-n" "-o" "-p" "-p /tmp" "-r" \
"-t" "-w" "-y" "-z" "-E" "-H" "-I" "-J" "-K" \
"-t" "-w" "-z" "-E" "-H" "-I" "-J" "-K" \
"-N" "-Q" "-R" "-T" "-W" "-Z"

log_assert "Execute zdb using invalid parameters."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ function cleanup
function test_imported_pool
{
typeset -a args=("-A" "-b" "-C" "-c" "-d" "-D" "-G" "-h" "-i" "-L" \
"-M" "-P" "-s" "-v" "-Y")
"-M" "-P" "-s" "-v" "-Y" "-y")
for i in ${args[@]}; do
log_must eval "zdb $i $TESTPOOL >/dev/null"
done
Expand All @@ -68,7 +68,7 @@ function test_exported_pool
{
log_must zpool export $TESTPOOL
typeset -a args=("-A" "-b" "-C" "-c" "-d" "-D" "-F" "-G" "-h" "-i" "-L" "-M" \
"-P" "-s" "-v" "-X" "-Y")
"-P" "-s" "-v" "-X" "-Y" "-y")
for i in ${args[@]}; do
log_must eval "zdb -e $i $TESTPOOL >/dev/null"
done
Expand Down

0 comments on commit 068b924

Please sign in to comment.