Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-2.2.5 patchset #16359

Merged
merged 48 commits into from
Aug 6, 2024
Merged

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Proposed patchset for zfs-2.2.5

Description

Support 6.9 kernel, and possibly the 6.10 kernel depending on testing. Include various commits from zfs-2.2.5-staging branch.

How Has This Been Tested?

Test built only. Buildbot will test.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

chenqiuhao1997 and others added 22 commits May 13, 2024 10:27
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
In dbuf_read_verify_dnode_crypt():
 - We don't need original dbuf locked there. Instead take a lock
on a dnode dbuf, that is actually manipulated.
 - Block decryption for a dnode dbuf if it is currently being
written.  ARC hash lock does not protect anonymous buffers, so
arc_untransform() is unsafe when used on buffers being written,
that may happen in case of encrypted dnode buffers, since they
are not copied by dbuf_dirty()/dbuf_hold_copy().

In dbuf_read():
 - If the buffer is in flight, recheck its compression/encryption
status after it is cached, since it may need arc_untransform().

Tested-by: Rich Ercolani <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16104
When compressed ARC is disabled, we may have to re-compress when
writing into L2ARC.  If doing so we can't fit it into the original
physical size, we should just fail immediately, since even if it
may still fit into allocation size, its checksum will never match.

While there, refactor the code similar to other compression places
without using abd_return_buf_copy().

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16038
There is no reason for these module parameters to be read-only.
Being modified they just apply on next pool import/creation, that
is useful for testing different values.

Reviewed-by: Rich Ercolani <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16118
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16131
Code for pools before version 11 uses dmu_objset_find_dp() to scan
for children datasets/clones.  It calls enqueue_clones_cb() and
enqueue_cb() callbacks in parallel from multiple taskq threads.
It ends up bad for scan_ds_queue_insert(), corrupting scn_queue
AVL-tree.  Fix it by introducing a mutex to protect those two
scan_ds_queue_insert() calls.  All other calls are done from the
sync thread and so serialized.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16162
Previous code overengineered cloned range calculation by using
BP_GET_LSIZE(). The problem is that legacy holes don't have the
logical size, so result will be wrong.  But we also don't need
to look on every block size, since they all must be identical.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16165
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#12366 
Closes openzfs#16159
Originally Solaris didn't expect errors there, but they may happen
if we fail to add entry into ZAP.  Linux fixed it in openzfs#7421, but it
was never fully ported to FreeBSD.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes openzfs#13215
Closes openzfs#16138
At the end of l2arc_evict() fix an assertion in the case that l2ad_hand
+ distance == l2ad_end.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#16202
Closes openzfs#16207
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#16272
Closes openzfs#16273
Since Linux 6.7 the kernel has defined intptr_t. Clang has
-Wtypedef-redefinition by default, which causes the build to fail
because we also have a typedef for intptr_t.

Since its better to use the kernel's if it exists, detect it and skip
our own.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16201
…penzfs#16282)

The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: openzfs#16089

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks.  We previously excluded
zap_micro.o with:

UBSAN_SANITIZE_zap_micro.o := n

For some reason that didn't work for the 6.9 kernel, which wants us
to use:

UBSAN_SANITIZE_zfs/zap_micro.o := n

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#16278
Closes openzfs#16330
Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.

This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.

This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.

For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#16224
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Update the META file to reflect compatibility with the 6.9
kernel.

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.

Signed-off-by: Alphan Yılmaz <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop.  Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine.  It also compiles fine on 6.8.11
(the previous kernel).  I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.

All other instances in the source tree are already terminated with
semicolons.

Signed-off-by: Daniel Berlin <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
@amotin
Copy link
Member

amotin commented Jul 17, 2024

Lets include also 02c5aa9 to fix a reported memory leak on FreeBSD.
Also may be 800d59d, since it is pretty small, but I don't insist.

@robn
Copy link
Member

robn commented Jul 17, 2024

Here's what I'd hoped to get in to 2.2.5.


vdev fault fix

Not yet merged to master, but trivial and a fix to a real issue introduced in 2.2.4.

#16258 vdev_open: clear async fault flag after reopen

zdb fixes

Not yet merged to master, but I think trivial and fix real bugs.

#16335 zdb: fix BRT dump
#16334 zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly

adb_iter_page fixes

Has been uneventful in my own dogfooding.

#16108 abd_iter_page: rework to handle multipage scatterlists

Clang build fixes

Needed to build with Clang in some kernel configurations.

#16206 Use memset to zero stack allocations containing unions

ZTS improvements

#16088 zts: allow running a single test by name only
#16096 zts: add a debug option to get full test output

Improved userspace assertions/backtraces

Just so much nicer for debugging, especially when supporting customers.

#16140 libspl: nice assertion output
#16167 Unbreak FreeBSD cross-build on MacOS
#16168 libspl_assert: always link -lpthread on FreeBSD
#16181 zdb/ztest: improve and harmonise crash output

Kernel version enforcement

Not yet merged, and I am not pushing it, but if it's a reasonable approach then now is as good a time as any.

#15986 config/kernel: enforce kernel max version, with escape hatch


All except #15986 picked to zfs-2.2.5-hutter here: https://github.com/robn/zfs/commits/zfs-2.2.5-staging-robn/.

I'd be happy to submit separate PRs for any/all, let me know what you'd like.

@SoongNoonien
Copy link
Contributor

Hi, I'm really looking forward to 2.2.5 but I hopped for the fix in c98295e to be included.

robn and others added 5 commits July 17, 2024 14:54
Specifying a single test is kind of a hassle, because the full relative
path under the test suite dir has to be included, but it's not always
clear what that path even is.

This change allows `-t` to take the name of a single test instead of a
full path. If the value has no `/` characters, we search for a file of
that name under the test root, and if found, use that as the full test
path instead.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Akash B <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16088
The test runner accumulates output from individual tests, then writes it
to the log at the end. If a test hangs or crashes the system half way
through, we get no insight into how it got to where it did.

This adds a -D option for "debug". When set, all test output is written
to stdout.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Akash B <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16096
Previously, abd_iter_page() would assume that every scatterlist would
contain a single page (compound or no), because that's all we ever
create in abd_alloc_chunks(). However, scatterlists can contain multiple
pages of arbitrary provenance, and if we get one of those, we'd get all
the math wrong.

This reworks things to handle multiple pages in a scatterlist, by
properly finding the right page within it for the given offset, and
understanding better where the end of the page is and not crossing it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16108
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16140
Check for the existence of execvpe(3) and only provide the FreeBSD
compat version if required.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Brooks Davis <[email protected]>
Closes openzfs#15609
robn and others added 2 commits July 23, 2024 11:58
Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: George Melikov <[email protected]>
For files smaller than recordsize, it's most likely that they don't have
L1 blocks. However, current calculation will always return at least 1 L1
block.

In this change, we check dnode level to figure out if it has L1 blocks
or not, and return 0 if it doesn't. This will reduce the chance of
unnecessary throttling when deleting a large number of small files.

Signed-off-by: Chunwei Chen <[email protected]>
Co-authored-by: Chunwei Chen <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
@tonyhutter
Copy link
Contributor Author

@robn I pulled in the latest AUTHORS
@mcmilk I pulled in #16284
@satmandu I pulled in #16264 but not #16302. #16302 appears to only save a a handful of cycles every second, so I don't know if it's worth the (minor) risk.

This is a follow-on to 156a641
that ignores UBSAN errors in sa.c.

Thank you @thwalker3 for the fix.

Original-patch-by: @thwalker3
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#16278
Closes openzfs#16330
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
@EriksRemess
Copy link

Linux kernel 6.9 branch as of 6.9.12 (July 27th) is marked as EOL. Maybe target for this ZFS version should be set to 6.10?

@colmbuckley
Copy link
Contributor

colmbuckley commented Jul 30, 2024

Just a quick note - anyone running Debian bookworm with backports enabled is probably having difficulty with a fresh install; the current linux package is based on 6.9 which isn't supported by 2.2.4; further the stable package is 6.1 which might be too much of a downgrade for many operators. The 6.7 headers and kernel image packages were removed in the past few days.

@tonyhutter
Copy link
Contributor Author

@EriksRemess I'm currently testing this PR against Ubuntu 24 running a 6.10.2 kernel. If that looks ok, then we can bump the supported kernel version for this release to 6.10.

- Explicitly disable compression since mkfile uses a zero buffer.
 - Explicitly sync file systems instead of waiting for timeout.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
@satmandu satmandu mentioned this pull request Jul 31, 2024
13 tasks
@Harry-Chen
Copy link
Contributor

I suggest pulling in #16376 (if merged).

db65272 was added to zfs-2.2.4 to stub in the
VDEV_PROP_RAIDZ_EXPANDING enum without adding the RAIDz expansion
feature.  This was needed to provide the right enum count for when the
VDEV_PROP_SLOW_IO proprieties got added.  This had the unfortunate side
effect of breaking module removal though.

Specifically, with the VDEV_PROP_RAIDZ_EXPANDING stub added,
the module would correctly omit making kobjects for the RAIDz expansion
vdev property, but then would try to blindly remove its non-existent
kobjects during module unload.

This commit fixes the issue by checking for an uninitialized kobject.

Fixes: openzfs#16249

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
META file and changelog updated.

Signed-off-by: Tony Hutter <[email protected]>
@prometheanfire
Copy link
Contributor

If #16400 merges it might be worth adding to the stack. (depending on how clean it'd be too)

@robn
Copy link
Member

robn commented Aug 4, 2024

If #16400 merges it might be worth adding to the stack

-1 from me, 6.11 is not finalised and the zvol changes are slightly invasive; I'd be nervous about them being on a stable release so early in their life.

@satmandu
Copy link
Contributor

satmandu commented Aug 5, 2024

Would it make sense to have #16409 , #16410 , and #16411 as well here as they're minor and keep errors from happening in ZTS?

@tonyhutter
Copy link
Contributor Author

Quick update - zfs-2.2.5 is currently passing on all the builders so I'm reluctant to add any more commits. I also have been trying to test ZFS with the 6.10.x kernel on Ubuntu using the development kernel packages from: https://kernel.ubuntu.com/mainline/. I noticed that their 6.10.2 kernel was really unstable, but their new 6.10.3 packages seem to be much better, so I'm running ZTS locally on a VM to see if 6.10 is supported.

@tonyhutter
Copy link
Contributor Author

Unfortunately I'm seeing ZFS failures on Ubuntu 24.10 w/6.10.3 kernel:

Tests with results other than PASS that are unexpected:
    FAIL cli_root/zpool_create/zpool_create_features_007_pos (expected PASS)
    FAIL cli_root/zpool_create/zpool_create_features_008_pos (expected PASS)
    FAIL cli_root/zpool_upgrade/zpool_upgrade_features_001_pos (expected PASS)
    FAIL cli_user/misc/arc_summary_001_pos (expected PASS)
    FAIL pyzfs/pyzfs_unittest (expected PASS)

My guess is that they're unrelated to the kernel but I don't want to hold up the release while I hunt them down.

I'm going to start building the zfs-2.2.5 packages as-is.

@comrade-meowski
Copy link

It's only anecdotal evidence of course but if it's any help I'm always running several physical and virtual test instances with the very latest kernels+zfs combinations - no problems here with Ubuntu 24.04 (not 24.10, I'm not tracking that) and kernel 6.10.3 with this zfs-2.2.5-staging branch. That's on my main daily driver laptop too so it's had 2-3 days of use with that combination with no obvious hiccups or regressions.

It would be really nice if someone could fix the pyzfs build failure on debian/ubuntu for this release though (#16155), we've been having to build on Ubuntu with --disable-pyzfs since 2.2.4.

@tonyhutter tonyhutter merged commit 33174af into openzfs:zfs-2.2-release Aug 6, 2024
24 checks passed
@tonyhutter
Copy link
Contributor Author

zfs-2.2.5 released: https://github.com/openzfs/zfs/releases/tag/zfs-2.2.5

@robn
Copy link
Member

robn commented Aug 6, 2024

Congrats @tonyhutter, great work!

@prometheanfire
Copy link
Contributor

Very nice, 6.10 not called out as supported due to issues in your testing? I'm running on 6.10.3 (gentoo-kernel) without issue, but I haven't done things related to those tests. I did have an issue with some backtraces before this patchset though (to be expected since it added support for 6.9 as well.

@tonyhutter
Copy link
Contributor Author

@prometheanfire right, zfs-2.2.5 builds fine on 6.10 and I don't anticipate any issues. I'd just like to get a clean test run on it to be sure it's 100% OK for 6.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.