-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always validate checksums for Direct I/O reads #16598
Conversation
Note: I'm not seeing the DIO errors in the JSON output ( |
This should give you the JSON: diff --git a/cmd/zpool/zpool_main.c b/cmd/zpool/zpool_main.c
index aa7da68aa..90f4225e1 100644
--- a/cmd/zpool/zpool_main.c
+++ b/cmd/zpool/zpool_main.c
@@ -9222,6 +9222,11 @@ vdev_stats_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *nv,
fnvlist_add_string(vds, "power_state", "-");
}
}
+ if (cb->cb_print_dio_verify) {
+ nice_num_str_nvlist(vds, "dio_verify_errors",
+ vs->vs_dio_verify_errors, cb->cb_literal,
+ cb->cb_json_as_int, ZFS_NICENUM_1024);
+ }
}
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verification and error propagation looks good. Thanks for running down this very strange but entirely possible case. Most of the comments are for minor nits.
2d48b54
to
182758e
Compare
Good catch. I add your patch to this PR. |
@amotin and I had a conversation. It was incorrect for me to allow the mirror VDEV to issue reads down to other children in the event of a Direct I/O read checksum verify failure. The issue is, that another child could get a valid checksum, and then the buffer is manipulated again. This would lead to self healing in I updated @behlendorf and @tonyhutter, can you relook at those two spots in the code and make sure my comments make sense to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the vdev_mirror_io_done()
case. Yes, the updated logic looks correct to me, we always want to fall back to reading through the ARC in the event of a checksum failure. It seems to me we also should do something similar for raidz and draid. Repair IOs are issued from pages in the user ABD which could change after the checksum verification like with the mirror.
This fixes an oversight in the Direct I/O PR. There is nothing that stops a process from manipulating the contents of a buffer for a Direct I/O read while the I/O is in flight. This can lead checksum verify failures. However, the disk contents are stil correct, and this would lead to false reporting of checksum validation failures. To remedy this, all Direct I/O reads that have a checksum verification failure are treated as suspicious. In the event a checksum validation failure occurs for a Direct I/O read, then the I/O request will be reissued though the ARC. This allows for actual validation to happen and removes any possibility of the buffer being manipulated after the I/O has been issued. Just as with Direct I/O write checksum validation failures, Direct I/O read checksum validation failures are reported though zpool status -d in the DIO columnm. Also the zevent has been updated to have both: 1. dio_verify_wr -> Checksum verification failure for writes 2. dio_verify_rd -> Checksum verification failure for reads. This allows for determining what I/O operation was the culprit for the checksum verification failure. All DIO errors are reported only on the top-level VDEV. Even though FreeBSD can write protect pages (stable pages) it still has the same issue as Linux with Direct I/O reads. This commit updates the following: 1. Propogates checksum failures for reads all the way up to the top-level VDEV. 2. Reports errors through zpool status -d as DIO. 3. Has two zevents for checksum verify errors with Direct I/O. One for read and one for write. 4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and handle ABD buffer contents validation the same as Linux. 5. Updated manipulate_user_buffer.c to also manipulate a buffer while a Direct I/O read is taking place. 6. Adds a new ZTS test case dio_read_verify that stress tests the new code. 7. Updated man pages. 8. Added an IMPLY statement to zio_checksum_verify() to make sure that Direct I/O reads are not issued as speculative. 9. Removed self healing through mirror, raidz, and dRAID VDEVs for Direct I/O reads. This issue was first observed when installing a Windows 11 VM on a ZFS dataset with the dataset property direct set to always. The zpool devices would report checksum failures, but running a subsequent zpool scrub would not repair any data and report no errors. Signed-off-by: Brian Atkinson <[email protected]>
182758e
to
8a90e8f
Compare
Good catch. As per your suggestion, I now leverage the |
This fixes an oversight in the Direct I/O PR. There is nothing that stops a process from manipulating the contents of a buffer for a Direct I/O read while the I/O is in flight. This can lead checksum verify failures. However, the disk contents are still correct, and this would lead to false reporting of checksum validation failures. To remedy this, all Direct I/O reads that have a checksum verification failure are treated as suspicious. In the event a checksum validation failure occurs for a Direct I/O read, then the I/O request will be reissued though the ARC. This allows for actual validation to happen and removes any possibility of the buffer being manipulated after the I/O has been issued. Just as with Direct I/O write checksum validation failures, Direct I/O read checksum validation failures are reported though zpool status -d in the DIO column. Also the zevent has been updated to have both: 1. dio_verify_wr -> Checksum verification failure for writes 2. dio_verify_rd -> Checksum verification failure for reads. This allows for determining what I/O operation was the culprit for the checksum verification failure. All DIO errors are reported only on the top-level VDEV. Even though FreeBSD can write protect pages (stable pages) it still has the same issue as Linux with Direct I/O reads. This commit updates the following: 1. Propogates checksum failures for reads all the way up to the top-level VDEV. 2. Reports errors through zpool status -d as DIO. 3. Has two zevents for checksum verify errors with Direct I/O. One for read and one for write. 4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and handle ABD buffer contents validation the same as Linux. 5. Updated manipulate_user_buffer.c to also manipulate a buffer while a Direct I/O read is taking place. 6. Adds a new ZTS test case dio_read_verify that stress tests the new code. 7. Updated man pages. 8. Added an IMPLY statement to zio_checksum_verify() to make sure that Direct I/O reads are not issued as speculative. 9. Removed self healing through mirror, raidz, and dRAID VDEVs for Direct I/O reads. This issue was first observed when installing a Windows 11 VM on a ZFS dataset with the dataset property direct set to always. The zpool devices would report checksum failures, but running a subsequent zpool scrub would not repair any data and report no errors. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes openzfs#16598
This fixes an oversight in the Direct I/O PR. There is nothing stops a process from manipulating the contents of a buffer for a Direct I/O read while the I/O is in flight. This can lead checksum verify failures. However, the disk contents are stil correct, but this would lead false reporting of checksum validations.
To remedy this, all Direct I/O reads that have a checksum verification failure are treated as suspicious. In the event a checksum validation failure occurs for a Direct I/O read, then the I/O request will be reissued though the ARC. This allows for actual validation to happen and removes any possibility of the buffer being manipulated after the I/O has been issued.
Just as with Direct I/O write checksum validation failures, Direct I/O read checksum validation failures are reported though zpool status -d in the DIO columnm. Also the zevent has been updated to have both:
Even though FreeBSD can write protect pages (stable pages) it still has the same issue as Linux in department.
This commit updates the following:
This issue was first observed when installing a Windows 11 VM on a ZFS dataset with the dataset property direct set to always. The zpool devices would report checksum failures, but running a subsequent zpool scrub would not repair any data and report no errors.
Make sure that checksum verification failures with Direct I/O reads are reported and always reissued through the ARC.
Motivation and Context
Without this, there are false positive checksum verification failures reported when a Direct I/O read is issued and a process is manipulating the buffer contents.
Description
See above.
How Has This Been Tested?
Added new test case. Tested on Linux: 6.5.12-100.fc37.x86_64 kernel. On that same Linux kernel I used Virtual Manger to install Windows 11, Ubuntu 22.04, and Fedora 38 on a dataset with the direct dataset property set to always. Ran ZTS direct test cases as well.
On FreeBSD tested ZTS direct tests on FreeBSD 13.3 -RELEASE and FreeBSD 14-CURRENT VM's.
Types of changes
Checklist:
Signed-off-by
.