-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-2.2.5 patchset #16359
zfs-2.2.5 patchset #16359
Conversation
In P2ALIGN, the result would be incorrect when align is unsigned integer and x is larger than max value of the type of align. In that case, -(align) would be a positive integer, which means high bits would be zero and finally stay zero after '&' when align is converted to a larger integer type. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Qiuhao Chen <[email protected]> Closes openzfs#15940
In dbuf_read_verify_dnode_crypt(): - We don't need original dbuf locked there. Instead take a lock on a dnode dbuf, that is actually manipulated. - Block decryption for a dnode dbuf if it is currently being written. ARC hash lock does not protect anonymous buffers, so arc_untransform() is unsafe when used on buffers being written, that may happen in case of encrypted dnode buffers, since they are not copied by dbuf_dirty()/dbuf_hold_copy(). In dbuf_read(): - If the buffer is in flight, recheck its compression/encryption status after it is cached, since it may need arc_untransform(). Tested-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16104
When compressed ARC is disabled, we may have to re-compress when writing into L2ARC. If doing so we can't fit it into the original physical size, we should just fail immediately, since even if it may still fit into allocation size, its checksum will never match. While there, refactor the code similar to other compression places without using abd_return_buf_copy(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16038
There is no reason for these module parameters to be read-only. Being modified they just apply on next pool import/creation, that is useful for testing different values. Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16118
As I understand just for being less predictable dnode hash includes 8 bits of objset pointer, starting at 6. But since objset_t is more than 1KB in size, its allocations are likely aligned to 2KB, that means 11 lower bits provide no entropy. Just take the 8 bits starting from 11. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16131
Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16162
Previous code overengineered cloned range calculation by using BP_GET_LSIZE(). The problem is that legacy holes don't have the logical size, so result will be wrong. But we also don't need to look on every block size, since they all must be identical. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16165
Depending on kind of error zap_expand_leaf() may return with or without valid leaf reference held. Make sure it returns NULL if due to error it has no leaf to return. Make its callers to check the returned leaf pointer, and release the leaf if it is not NULL. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#12366 Closes openzfs#16159
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215 Closes openzfs#16138
At the end of l2arc_evict() fix an assertion in the case that l2ad_hand + distance == l2ad_end. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#16202 Closes openzfs#16207
In the commit of the head_errlog feature we introduced a bug in dsl_dataset_promote_sync(): we may dereference origin_head and hds, both dereferencing ddpa after calling promote_sync() on ddpa. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#16272 Closes openzfs#16273
Since Linux 6.7 the kernel has defined intptr_t. Clang has -Wtypedef-redefinition by default, which causes the build to fail because we also have a typedef for intptr_t. Since its better to use the kernel's if it exists, detect it and skip our own. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16201
…penzfs#16282) The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Rob Norris <[email protected]>
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain kernel objects from the UBSAN checks. We previously excluded zap_micro.o with: UBSAN_SANITIZE_zap_micro.o := n For some reason that didn't work for the 6.9 kernel, which wants us to use: UBSAN_SANITIZE_zfs/zap_micro.o := n Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16278 Closes openzfs#16330
Linux has started moving to a model where instead of applying block queue limits through individual modification functions, a complete limits structure is built up and applied atomically, either when the block device or open, or some time afterwards. As of 6.10 this transition appears only partly completed. This commit matches that model within OpenZFS in a way that should work for past and future kernels. We set up a queue limits structure with any limits that have had their modification functions removed. For newer kernels that can have limits applied at block device open (HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the OpenZFS queue limits structure into Linux's queue_limits structure, which can then be passed in. For older kernels, we provide an application function that just calls the old functions for each limit in the structure. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a function, such that the old #undef for it in spl-kmem-cache.c would remove its definition completely, breaking the build. This inverts the model used before. Rather than always defining the kmem_cache_* macro, then undefining then inside spl-kmem-cache.c, instead we make a special tag to indicate we're currently inside spl-kmem-cache.c, and not defining those in macros in the first place, so we can use the kernel-supplied kmem_cache_* functions to implement spl_kmem_cache_*, as we expect. For all other callers, we create the macros as normal and remove access to the kernel's own conflicting names. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no longer exists, but the helper has been updated, so detect it and use it in all versions where it is available. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
We're seeing failures for redacted_deleted and redacted_mount on FreeBSD 13-15: 09:58:34.74 diff: /dev/fd/3: No such file or directory 09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2 The test was trying to diff the file listings between two directories to see if they are the same. The workaround is to do a string comparison of the directory listings instead of using `diff`. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16224
FreeBSD patchlevel versions are optional and, if present, in a different location in the version string. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Update the META file to reflect compatibility with the 6.9 kernel. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
This commit fixes what is probably a copy-paste mistake. The `dracut.zfs` manpage claims that the `bootfs.rollback` option executes `zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs rollback` does. Signed-off-by: Alphan Yılmaz <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str expands to a "do {<stuff> } while(0)" loop. Without this semicolon, the while(0) is unterminated, causing a cascade of useless errors. With this semicolon, it compiles fine. It also compiles fine on 6.8.11 (the previous kernel). I have not tested earlier kernels than that, but at worst it should add a pointless semicolon. All other instances in the source tree are already terminated with semicolons. Signed-off-by: Daniel Berlin <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Here's what I'd hoped to get in to 2.2.5. vdev fault fixNot yet merged to master, but trivial and a fix to a real issue introduced in 2.2.4. #16258 vdev_open: clear async fault flag after reopen zdb fixesNot yet merged to master, but I think trivial and fix real bugs. #16335 zdb: fix BRT dump adb_iter_page fixesHas been uneventful in my own dogfooding. #16108 abd_iter_page: rework to handle multipage scatterlists Clang build fixesNeeded to build with Clang in some kernel configurations. #16206 Use memset to zero stack allocations containing unions ZTS improvements#16088 zts: allow running a single test by name only Improved userspace assertions/backtracesJust so much nicer for debugging, especially when supporting customers. #16140 libspl: nice assertion output Kernel version enforcementNot yet merged, and I am not pushing it, but if it's a reasonable approach then now is as good a time as any. #15986 config/kernel: enforce kernel max version, with escape hatch All except #15986 picked to I'd be happy to submit separate PRs for any/all, let me know what you'd like. |
Hi, I'm really looking forward to 2.2.5 but I hopped for the fix in c98295e to be included. |
Specifying a single test is kind of a hassle, because the full relative path under the test suite dir has to be included, but it's not always clear what that path even is. This change allows `-t` to take the name of a single test instead of a full path. If the value has no `/` characters, we search for a file of that name under the test root, and if found, use that as the full test path instead. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Akash B <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16088
The test runner accumulates output from individual tests, then writes it to the log at the end. If a test hangs or crashes the system half way through, we get no insight into how it got to where it did. This adds a -D option for "debug". When set, all test output is written to stdout. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Akash B <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16096
Previously, abd_iter_page() would assume that every scatterlist would contain a single page (compound or no), because that's all we ever create in abd_alloc_chunks(). However, scatterlists can contain multiple pages of arbitrary provenance, and if we get one of those, we'd get all the math wrong. This reworks things to handle multiple pages in a scatterlist, by properly finding the right page within it for the given offset, and understanding better where the end of the page is and not crossing it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reported-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16108
The "not found" path is attempting to clear SOMELIB_CFLAGS and SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`, which is defined as "if the first arg is non-empty". The m4 "empty" construction is [], therefore, the existing AC_SUBST calls never modify the variables at all. The effect of this is that leftovers from the library test can leak out. At least, if a library header is found in the first stage, but the library itself is not, -lsomelib is added to SOMELIB_LIBS and further tests done. If that library is not found, SOMELIB_LIBS will not be cleared. For most of our library tests this hasn't been a problem, as they're either always found properly via pkg-config or set directly, or the calling test immediately aborts configure. For an optional dependency however, an apparent "partial" result where the header is found but no corresponding library causes link errors later. I think a complete fix should probably not be setting SOMELIB_xxx until the final result is known, but for now, adjusting the AC_SUBST calls to explictly set the empty shell string (which is not "empty" to m4) at least restores the intent. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16140
Check for the existence of execvpe(3) and only provide the FreeBSD compat version if required. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes openzfs#15609
Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]>
For files smaller than recordsize, it's most likely that they don't have L1 blocks. However, current calculation will always return at least 1 L1 block. In this change, we check dnode level to figure out if it has L1 blocks or not, and return 0 if it doesn't. This will reduce the chance of unnecessary throttling when deleting a large number of small files. Signed-off-by: Chunwei Chen <[email protected]> Co-authored-by: Chunwei Chen <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
98b269d
to
47c003b
Compare
This is a follow-on to 156a641 that ignores UBSAN errors in sa.c. Thank you @thwalker3 for the fix. Original-patch-by: @thwalker3 Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16278 Closes openzfs#16330 Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
47c003b
to
5c45ac6
Compare
Linux kernel 6.9 branch as of 6.9.12 (July 27th) is marked as EOL. Maybe target for this ZFS version should be set to 6.10? |
Just a quick note - anyone running Debian bookworm with backports enabled is probably having difficulty with a fresh install; the current linux package is based on 6.9 which isn't supported by 2.2.4; further the stable package is 6.1 which might be too much of a downgrade for many operators. The 6.7 headers and kernel image packages were removed in the past few days. |
@EriksRemess I'm currently testing this PR against Ubuntu 24 running a 6.10.2 kernel. If that looks ok, then we can bump the supported kernel version for this release to 6.10. |
- Explicitly disable compression since mkfile uses a zero buffer. - Explicitly sync file systems instead of waiting for timeout. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
5c45ac6
to
544b6c1
Compare
I suggest pulling in #16376 (if merged). |
db65272 was added to zfs-2.2.4 to stub in the VDEV_PROP_RAIDZ_EXPANDING enum without adding the RAIDz expansion feature. This was needed to provide the right enum count for when the VDEV_PROP_SLOW_IO proprieties got added. This had the unfortunate side effect of breaking module removal though. Specifically, with the VDEV_PROP_RAIDZ_EXPANDING stub added, the module would correctly omit making kobjects for the RAIDz expansion vdev property, but then would try to blindly remove its non-existent kobjects during module unload. This commit fixes the issue by checking for an uninitialized kobject. Fixes: openzfs#16249 Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Tino Reichardt <[email protected]>
META file and changelog updated. Signed-off-by: Tony Hutter <[email protected]>
544b6c1
to
33174af
Compare
If #16400 merges it might be worth adding to the stack. (depending on how clean it'd be too) |
-1 from me, 6.11 is not finalised and the zvol changes are slightly invasive; I'd be nervous about them being on a stable release so early in their life. |
Quick update - zfs-2.2.5 is currently passing on all the builders so I'm reluctant to add any more commits. I also have been trying to test ZFS with the 6.10.x kernel on Ubuntu using the development kernel packages from: https://kernel.ubuntu.com/mainline/. I noticed that their 6.10.2 kernel was really unstable, but their new 6.10.3 packages seem to be much better, so I'm running ZTS locally on a VM to see if 6.10 is supported. |
Unfortunately I'm seeing ZFS failures on Ubuntu 24.10 w/6.10.3 kernel:
My guess is that they're unrelated to the kernel but I don't want to hold up the release while I hunt them down. I'm going to start building the zfs-2.2.5 packages as-is. |
It's only anecdotal evidence of course but if it's any help I'm always running several physical and virtual test instances with the very latest kernels+zfs combinations - no problems here with Ubuntu 24.04 (not 24.10, I'm not tracking that) and kernel 6.10.3 with this zfs-2.2.5-staging branch. That's on my main daily driver laptop too so it's had 2-3 days of use with that combination with no obvious hiccups or regressions. It would be really nice if someone could fix the pyzfs build failure on debian/ubuntu for this release though (#16155), we've been having to build on Ubuntu with --disable-pyzfs since 2.2.4. |
zfs-2.2.5 released: https://github.com/openzfs/zfs/releases/tag/zfs-2.2.5 |
Congrats @tonyhutter, great work! |
Very nice, 6.10 not called out as supported due to issues in your testing? I'm running on 6.10.3 (gentoo-kernel) without issue, but I haven't done things related to those tests. I did have an issue with some backtraces before this patchset though (to be expected since it added support for 6.9 as well. |
@prometheanfire right, zfs-2.2.5 builds fine on 6.10 and I don't anticipate any issues. I'd just like to get a clean test run on it to be sure it's 100% OK for 6.10. |
Motivation and Context
Proposed patchset for zfs-2.2.5
Description
Support 6.9 kernel, and possibly the 6.10 kernel depending on testing. Include various commits from zfs-2.2.5-staging branch.
How Has This Been Tested?
Test built only. Buildbot will test.
Types of changes
Checklist:
Signed-off-by
.