Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always validate checksums for Direct I/O reads #16598

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

bwatkinson
Copy link
Contributor

@bwatkinson bwatkinson commented Oct 3, 2024

This fixes an oversight in the Direct I/O PR. There is nothing stops a process from manipulating the contents of a buffer for a Direct I/O read while the I/O is in flight. This can lead checksum verify failures. However, the disk contents are stil correct, but this would lead false reporting of checksum validations.

To remedy this, all Direct I/O reads that have a checksum verification failure are treated as suspicious. In the event a checksum validation failure occurs for a Direct I/O read, then the I/O request will be reissued though the ARC. This allows for actual validation to happen and removes any possibility of the buffer being manipulated after the I/O has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O read checksum validation failures are reported though zpool status -d in the DIO columnm. Also the zevent has been updated to have both:

  1. dio_verify_wr -> Checksum verification failure for writes
  2. dio_verify_rd -> Checksum verification failure for reads. This allows for determining what I/O operation was the culprit for the checksum verification failure. All DIO errors are reported only on the top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has the same issue as Linux in department.

This commit updates the following:

  1. Propogates checksum failures for reads all the way up to the top-level VDEV.
  2. Reports errors through zpool status -d as DIO.
  3. Has two zevents for checksum verify errors with Direct I/O. One for read and one for write.
  4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and handle ABD buffer contents validation the same as Linux.
  5. Moves the declartion of nbytes in zfs_read() to the top of the function and outside of the while loop. This was needed due to a compliation failure in FreeBSD.
  6. Updated manipulate_user_buffer.c to also manipulate a buffer while a Direct I/O read is taking place.
  7. Adds a new test case dio_read_verify that stress tests the new code.

This issue was first observed when installing a Windows 11 VM on a ZFS dataset with the dataset property direct set to always. The zpool devices would report checksum failures, but running a subsequent zpool scrub would not repair any data and report no errors.

Make sure that checksum verification failures with Direct I/O reads are reported and always reissued through the ARC.

Motivation and Context

Without this, there are false positive checksum verification failures reported when a Direct I/O read is issued and a process is manipulating the buffer contents.

Description

See above.

How Has This Been Tested?

Added new test case. Tested on Linux: 6.5.12-100.fc37.x86_64 kernel. On that same Linux kernel I used Virtual Manger to install Windows 11, Ubuntu 22.04, and Fedora 38 on a dataset with the direct dataset property set to always. Ran ZTS direct test cases as well.

On FreeBSD tested ZTS direct tests on FreeBSD 13.3 -RELEASE and FreeBSD 14-CURRENT VM's.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 3, 2024
@tonyhutter
Copy link
Contributor

Note: I'm not seeing the DIO errors in the JSON output (zpool status -d -j | jq). I dunno if you want to add that in this PR or another PR, but thought it was worth mentioning.

@tonyhutter
Copy link
Contributor

This should give you the JSON:

diff --git a/cmd/zpool/zpool_main.c b/cmd/zpool/zpool_main.c
index aa7da68aa..90f4225e1 100644
--- a/cmd/zpool/zpool_main.c
+++ b/cmd/zpool/zpool_main.c
@@ -9222,6 +9222,11 @@ vdev_stats_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *nv,
                                fnvlist_add_string(vds, "power_state", "-");
                        }
                }
+               if (cb->cb_print_dio_verify) {
+                       nice_num_str_nvlist(vds, "dio_verify_errors",
+                           vs->vs_dio_verify_errors, cb->cb_literal,
+                           cb->cb_json_as_int, ZFS_NICENUM_1024);
+               }
        }
 
        if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verification and error propagation looks good. Thanks for running down this very strange but entirely possible case. Most of the comments are for minor nits.

module/zfs/vdev_indirect.c Outdated Show resolved Hide resolved
module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved
module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved
module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved
module/zfs/zio.c Outdated Show resolved Hide resolved
module/zfs/zio.c Outdated Show resolved Hide resolved
module/zfs/zio.c Outdated Show resolved Hide resolved
module/zfs/zio.c Show resolved Hide resolved
module/zfs/zio.c Outdated Show resolved Hide resolved
@bwatkinson
Copy link
Contributor Author

This should give you the JSON:

diff --git a/cmd/zpool/zpool_main.c b/cmd/zpool/zpool_main.c
index aa7da68aa..90f4225e1 100644
--- a/cmd/zpool/zpool_main.c
+++ b/cmd/zpool/zpool_main.c
@@ -9222,6 +9222,11 @@ vdev_stats_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *nv,
                                fnvlist_add_string(vds, "power_state", "-");
                        }
                }
+               if (cb->cb_print_dio_verify) {
+                       nice_num_str_nvlist(vds, "dio_verify_errors",
+                           vs->vs_dio_verify_errors, cb->cb_literal,
+                           cb->cb_json_as_int, ZFS_NICENUM_1024);
+               }
        }
 
        if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,

Good catch. I add your patch to this PR.

@bwatkinson
Copy link
Contributor Author

@amotin and I had a conversation. It was incorrect for me to allow the mirror VDEV to issue reads down to other children in the event of a Direct I/O read checksum verify failure. The issue is, that another child could get a valid checksum, and then the buffer is manipulated again. This would lead to self healing in vdev_mirror_io_done() issue bad data down to the other children.

I updated vdev_mirror_io_done() to return immediately once a Direct I/O read checksum verification failure occurs. I also updated the comments in that code block. In zio_vdev_child_io() I now ASSERT that the parent can not have a Direct I/O checksum validation error. This makes sure children in the mirror are not being used to read after the initial checksum validation failure.

@behlendorf and @tonyhutter, can you relook at those two spots in the code and make sure my comments make sense to you?

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the vdev_mirror_io_done() case. Yes, the updated logic looks correct to me, we always want to fall back to reading through the ARC in the event of a checksum failure. It seems to me we also should do something similar for raidz and draid. Repair IOs are issued from pages in the user ABD which could change after the checksum verification like with the mirror.

This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are stil correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO columnm. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Signed-off-by: Brian Atkinson <[email protected]>
@bwatkinson
Copy link
Contributor Author

Good catch on the vdev_mirror_io_done() case. Yes, the updated logic looks correct to me, we always want to fall back to reading through the ARC in the event of a checksum failure. It seems to me we also should do something similar for raidz and draid. Repair IOs are issued from pages in the user ABD which could change after the checksum verification like with the mirror.

Good catch. As per your suggestion, I now leverage the rc_allow_repair to make sure we never issue self healing for Direct I/O reads for raids and dRAID VDEVs.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 9, 2024
@behlendorf behlendorf merged commit b4e4cbe into openzfs:master Oct 9, 2024
20 checks passed
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Oct 9, 2024
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes openzfs#16598
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants