Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend zdb to print inconsistencies in livelists and metaslabs #10515

Merged
merged 1 commit into from
Jul 15, 2020

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Jun 30, 2020

Motivation and Context

Livelists and spacemaps are data structures that are logs of allocations
and frees. Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab. However, it
generally halts processing when the error is detected.

Description

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, zdb -y, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
zdb -b, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:

  • report leftover FREEs
  • report double ALLOCs and double FREEs
  • record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:

  • iterate over spacemap and then the metaslab's entries in the
    spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps. The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

Implemented by @shartse and @sdimitro.

How Has This Been Tested?

ztest runs zdb with the new -y flag.

Tested on a broken pool.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@ahrens ahrens requested review from sdimitro and shartse June 30, 2020 04:13
@ahrens ahrens added Status: Code Review Needed Ready for review and testing Type: Feature Feature request or new feature labels Jun 30, 2020
@codecov
Copy link

codecov bot commented Jun 30, 2020

Codecov Report

Merging #10515 into master will decrease coverage by 0.02%.
The diff coverage is 76.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10515      +/-   ##
==========================================
- Coverage   79.71%   79.68%   -0.03%     
==========================================
  Files         392      392              
  Lines      124609   124807     +198     
==========================================
+ Hits        99327    99458     +131     
- Misses      25282    25349      +67     
Flag Coverage Δ
#kernel 80.16% <80.00%> (-0.01%) ⬇️
#user 65.47% <76.71%> (-0.28%) ⬇️
Impacted Files Coverage Δ
cmd/ztest/ztest.c 80.77% <ø> (+2.71%) ⬆️
module/zfs/metaslab.c 94.35% <ø> (-1.35%) ⬇️
cmd/zdb/zdb.c 81.70% <75.94%> (-0.55%) ⬇️
module/zfs/space_map.c 93.17% <100.00%> (-0.40%) ⬇️
module/zfs/vdev_indirect.c 72.66% <0.00%> (-12.34%) ⬇️
module/zfs/dsl_synctask.c 92.40% <0.00%> (-2.54%) ⬇️
module/zfs/vdev_removal.c 94.74% <0.00%> (-1.95%) ⬇️
module/zfs/vdev_indirect_mapping.c 97.10% <0.00%> (-1.94%) ⬇️
module/zfs/spa_log_spacemap.c 93.98% <0.00%> (-1.72%) ⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4b0a74...068b924. Read the comment docs.

@shartse
Copy link
Contributor

shartse commented Jun 30, 2020

Thanks for opening this Matt. We'll need to update the tests zdb_args_neg and zdb_args_pos to know about the new flag. I'm also planning on doing a round of testing where I re-introduce the livelist corruption behavior and make sure this catches it as expected.

Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

External-issue: DLPX-66031
Signed-off-by: Matthew Ahrens <[email protected]>
@ahrens
Copy link
Member Author

ahrens commented Jul 7, 2020

@shartse Thanks for the pointer, I've updated those tests.

@behlendorf
Copy link
Contributor

@shartse is there anything else you'd like to add this to PR. You mentioned doing another round of testing above.

@shartse
Copy link
Contributor

shartse commented Jul 14, 2020

Yeah, I'll do those tests today and report back

Copy link
Contributor

@shartse shartse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-tested some of the livelist verification, it looks good to me.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 15, 2020
@behlendorf
Copy link
Contributor

Great. Thanks for re-testing, then this is ready to integrate.

@behlendorf behlendorf merged commit 6774931 into openzfs:master Jul 15, 2020
@ahrens ahrens deleted the zdb branch July 15, 2020 05:01
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Sara Hartse <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-66031
Closes openzfs#10515
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Sara Hartse <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-66031
Closes openzfs#10515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants