[2.2] vdev_disk: ensure trim errors are returned immediately #16081

robn · 2024-04-11T01:27:17Z

Motivation and Context

Backporting #16070 for 2.2.

Description

After 08fd5cc, the discard issuing code was organised such that if requesting an async discard or secure erase failed before the IO was issued (that is, calling __blkdev_issue_discard() returned an error), the failed zio would never be executed, resulting in txg_sync hanging forever waiting for IO to finish.

This commit fixes that by immediately executing a failed zio on error. To handle the successful synchronous op case, we fake an async op by, when not using an asynchronous submission method, queuing the successful result zio as part of the discard handler.

Since it was hard to understand the differences between discard and secure erase, and sync and async, across different kernel versions, I've commented and reorganised the code a bit to try and make everything more contained and linear.

How Has This Been Tested?

Compiled and successfully passed zpool_trim test suites on kernels:

4.14.336
5.10.214
6.1.83
6.8.2

On 5.10.214, with loopback devices (which have incorrect discard_granularity, see #16068, both zpool trim and autotrim=on woud hang. With this in place, they appear to succeed, and the failures are recorded in /proc/spl/kstat/zfs/xxx/iostats. This is returning to the previous behaviour.

See also testing on #16070, which should all hold here.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

After 08fd5cc, the discard issuing code was organised such that if requesting an async discard or secure erase failed before the IO was issued (that is, calling __blkdev_issue_discard() returned an error), the failed zio would never be executed, resulting in txg_sync hanging forever waiting for IO to finish. This commit fixes that by immediately executing a failed zio on error. To handle the successful synchronous op case, we fake an async op by, when not using an asynchronous submission method, queuing the successful result zio as part of the discard handler. Since it was hard to understand the differences between discard and secure erase, and sync and async, across different kernel versions, I've commented and reorganised the code a bit to try and make everything more contained and linear. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> (cherry picked from commit ba9f587)

amotin approved these changes Apr 11, 2024

View reviewed changes

behlendorf approved these changes Apr 11, 2024

View reviewed changes

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Apr 11, 2024

behlendorf merged commit d0d9dcc into openzfs:zfs-2.2.4-staging Apr 11, 2024
22 of 24 checks passed

robn mentioned this pull request Apr 12, 2024

Issuing "zpool trim" locks up zfs and makes pool importable only in RO mode #16056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2.2] vdev_disk: ensure trim errors are returned immediately #16081

[2.2] vdev_disk: ensure trim errors are returned immediately #16081

robn commented Apr 11, 2024 •

edited

Loading

[2.2] vdev_disk: ensure trim errors are returned immediately #16081

[2.2] vdev_disk: ensure trim errors are returned immediately #16081

Conversation

robn commented Apr 11, 2024 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

robn commented Apr 11, 2024 •

edited

Loading