Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra zap_put_leaf in an unlikely code path #12366

Closed
avg-I opened this issue Jul 14, 2021 · 8 comments · Fixed by #16159
Closed

extra zap_put_leaf in an unlikely code path #12366

avg-I opened this issue Jul 14, 2021 · 8 comments · Fixed by #16159
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@avg-I
Copy link
Contributor

avg-I commented Jul 14, 2021

The problem is found by a code review of the latest code, so no OS / version details.
But the code has been unchanged for many years.

fzap_add and fzap_update may need to call zap_expand_leaf if there is not enough room for a new entry.
zap_expand_leaf may need to call zap_grow_ptrtbl if the current pointer table is not large enough.
In the very unlikely -- but not impossible -- event that the zap_grow_ptrtbl call fails, zap_expand_leaf would return with the original leaf, l, released by a call to zap_put_leaf and with the output leaf pointer, lp, keeping it original value.
Both fzap_add and fzap_update call zap_expand_leaf like this:

err = zap_expand_leaf(zn, l, tag, tx, &l);

So, in the described scenario, l would still point to the original leaf, but the leaf would be released.
In the epilogue parts of the functions there is a call to zap_put_leaf_maybe_grow_ptrtbl that would zap_put_leaf the leaf again.

One potential consequence, in non-debug builds, is that a dirty hold would get released for the leaf's underlying dbuf.
Thus, the dbuf would both be free for a re-use and still be referenced from its dirty record.

@avg-I avg-I added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 14, 2021
@avg-I
Copy link
Contributor Author

avg-I commented Jul 14, 2021

@ahrens , when you have some time, could you please double-check my observations?
Thank you!

@avg-I
Copy link
Contributor Author

avg-I commented Jul 14, 2021

The same kind of problem can also happen if the zap_deref_leaf call in zap_expand_leaf fails.
In both cases zn->zn_zap is not NULL, the leaf is not referenced and lp points to the leaf.

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jul 31, 2022
@avg-I
Copy link
Contributor Author

avg-I commented Aug 1, 2022

I believe that this is a valid bug, although quite hard to hit in practice.

@stale stale bot removed the Status: Stale No recent activity for issue label Aug 1, 2022
@behlendorf behlendorf added the Bot: Not Stale Override for the stale bot label Aug 1, 2022
@snajpa
Copy link
Contributor

snajpa commented May 3, 2024

We have probably hit this just now?

VERIFY3(NULL == dmu_buf_set_user(l->l_dbuf, &l->l_dbu)) failed (0000000000000000 == ffff91dfeb60e200)
PANIC at zap.c:441:zap_create_leaf()
Showing stack for process 2470499
CPU: 10 PID: 2470499 Comm: python3 Tainted: G           OE      6.8.8 #1-vpsAdminOS
Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.5.6 10/06/2021
In memory cgroup /osctl/pool.tank/group.default/user.2739/ct.19681/user-owned/lxc.payload.19681/docker/3bd6008a375b00b30cb2fccbf7dcd2e32c081094425b4c8603c16fa09743d08a
Call Trace:
 <TASK>
 dump_stack_lvl+0x53/0x70
 spl_panic+0x100/0x120 [spl]
 zap_expand_leaf+0x4ec/0x520 [zfs]
 fzap_add_cd+0xf9/0x230 [zfs]
 fzap_add+0x47/0xd0 [zfs]
 zap_add_impl+0x15e/0x330 [zfs]
 ? srso_return_thunk+0x5/0x5f
 ? zap_add+0x5c/0xd0 [zfs]
 zfs_link_create+0x189/0x500 [zfs]
 zfs_create+0x75f/0xa00 [zfs]
 zpl_create+0xd0/0x1e0 [zfs]
 path_openat+0xe9c/0x1180
 do_filp_open+0xb3/0x160
 do_sys_openat2+0xab/0xe0
 __x64_sys_openat+0x6e/0xa0
 do_syscall_64+0xab/0x1b0
 entry_SYSCALL_64_after_hwframe+0x79/0x81
RIP: 0033:0x7f9df8ef9f01
Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d ea 26 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
RSP: 002b:00007ffe2a11f4d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000080241 RCX: 00007f9df8ef9f01
RDX: 0000000000080241 RSI: 00007f9df8a61d00 RDI: 00000000ffffff9c
RBP: 00007f9df8a61d00 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000202 R12: 00007f9df8e00fc0
R13: 00000000ffffffff R14: 00007f9df8acfaf0 R15: 0000000000000001
 </TASK>

@snajpa
Copy link
Contributor

snajpa commented May 3, 2024

ping @amotin - just in case it might click for you faster, do you see any possible relation here to your zap related optimizations, please?

@snajpa
Copy link
Contributor

snajpa commented May 3, 2024

It seems unlikely that this is the same issue tho, if the zap_grow_ptrtbl function would return an error, it wouldn't get to zap_create_leaf. Maybe it could rather be related to 80cc516?

@snajpa
Copy link
Contributor

snajpa commented May 3, 2024

dunno, I'll create a new issue so that I don't spam this one...

amotin added a commit to amotin/zfs that referenced this issue May 3, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#12366
behlendorf pushed a commit that referenced this issue May 10, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes #12366 
Closes #16159
amotin added a commit to amotin/zfs that referenced this issue May 23, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#12366 
Closes openzfs#16159
ixhamza pushed a commit to truenas/zfs that referenced this issue May 23, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#12366 
Closes openzfs#16159
ixhamza pushed a commit to truenas/zfs that referenced this issue May 23, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#12366 
Closes openzfs#16159
behlendorf pushed a commit that referenced this issue May 29, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes #12366 
Closes #16159
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Sep 4, 2024
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#12366 
Closes openzfs#16159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants