Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bdev_discard_supported: understand discard_granularity=0 #16082

Merged
merged 1 commit into from
Apr 12, 2024

Conversation

robn
Copy link
Member

@robn robn commented Apr 11, 2024

Motivation and Context

OpenZFS does not consider the discard_granularity property when enabling discard (TRIM), leading to the kernel rejecting discard requests with confusing warnings in the logs.

Closes #16068.

Description

Kernel documentation for the discard_granularity property says:

A discard_granularity of 0 means that the device does not support discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c2f) the discard entry point blkdev_issue_discard() has had a check for this, which would immediately reject the call with EOPNOTSUPP, and throw a scary diagnostic message into the log. See #16068.

Since 6.8, the block layer sets a non-zero default for discard_granularity (torvalds/linux@3c407dc723bb), and a future kernel will remove the check entirely.

As such, there's no good reason for us to enable discard when discard_granularity=0. The kernel will never let the request go in anyway; better that we just disable it so we can report it properly to the user.

How Has This Been Tested?

Compile checked on kernels:

  • 3.10.0-1160.108.1.el7
  • 4.14.336
  • 5.10.214
  • 6.8.2

On 5.10.214, loop devices do not set discard_granularity when the loop is backed by a file, making it a good candidate to check with.

Before:

$ cat /sys/block/loop0/queue/discard_granularity
0
$ zpool status -t
  pool: tank
 state: ONLINE
config:

  NAME        STATE     READ WRITE CKSUM
  tank        ONLINE       0     0     0
   loop0     ONLINE       0     0     0  (untrimmed)

errors: No known data errors
$ zpool trim -w tank
[    3.974677] ------------[ cut here ]------------
[    3.974800] WARNING: CPU: 0 PID: 569 at block/blk-lib.c:51 __blkdev_issue_discard+0x22c/0x290
...

After:

$ cat /sys/block/loop0/queue/discard_granularity
0
$ zpool status -t
  pool: tank
 state: ONLINE
config:

  NAME        STATE     READ WRITE CKSUM
  tank        ONLINE       0     0     0
   loop0     ONLINE       0     0     0  (trim unsupported)

errors: No known data errors
$ zpool trim -w tank
cannot trim: no devices in pool support trim operations

This does cause parts of the zpool_trim and trim test suites to fail on kernels affected by this change. To be fair, they were somewhat erroneous in succeeding before, since all those tests really check for is that discards were issued; errors are not reported back through OpenZFS userspace tools (though they are visible, at distance, through the pool's iostats kstat). I'll see what shakes out of CI first, but I will likely have to update the tests. Current ideas include, skipping the tests on kernels where loop doesn't set discard_granularity, using an alternate block device for the tests (scsi_debug), or introducing some testing hook.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c2f) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See openzfs#16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux@3c407dc723bb), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when
discard_granularity=0. The kernel will never let the request go in
anyway; better that we just disable it so we can report it properly to
the user.

1. https://patchwork.kernel.org/project/linux-block/patch/[email protected]/

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16068
@robn robn force-pushed the trim-discard-granularity branch from 6a61ea7 to 3a4f74d Compare April 11, 2024 03:03
@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Apr 12, 2024
@behlendorf
Copy link
Contributor

Thanks for running this down. This nicely explains why we've only seen this is certain environments.

@behlendorf behlendorf merged commit b181b2e into openzfs:master Apr 12, 2024
24 of 25 checks passed
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c2f) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See openzfs#16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux@3c407dc723bb), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when
discard_granularity=0. The kernel will never let the request go in
anyway; better that we just disable it so we can report it properly to
the user.

1. https://patchwork.kernel.org/project/linux-block/patch/[email protected]/

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16068
Closes openzfs#16082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: discard_granularity is 0
2 participants