Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid variable-length-array syntax that trip up strict-bound UBSAN #15460

Closed

Conversation

ThomasLamprecht
Copy link
Contributor

Motivation and Context

This fixes #15145 i.e., a stricter bound checking when UBSAN is enabled for Linux since 6.5.

Disabling UBSAN is not really an option for us, and the "modern" (C99) VLA syntax is so widely used, that using it makes IMO the code a tiny bit more readable as a side effect.

Description

Use the more common C99 syntax to declare variable length arrays, while older ones, where a (fake) array length of one was declared, works too in C, it's making it impossible to distinguish an out-of-bound access for a variable array access.

How Has This Been Tested?

I have built ZFS 2.2 with this patch for the 6.5 kernel, then I

  • booted in setups that used 6.2 kernel and ZFS 2.1.13 as root filesystem
  • did many ZFS syncs between that setup and an older still using ZFS 2.1.13, in both directions.
  • general usage of ZFS volumes/datasets for e.g., FS mounts for containers or directly for virtual machines
  • ran the ZFS Test Suite

In #15145 (comment) it's mentioned that there could be some potential regression due to change in size that the structs report through the sizeof operator, I tried to check the changed structs usage in-depth, and commented on them in the commit message, but as I'm not that familiar with the code base I might have missed some things.

If anybody knows better tests to ensure compatibility w.r.t. the structs touched I'd be glad to hear them, and try to execute/implement them.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin
Copy link
Member

amotin commented Oct 27, 2023

I've found at least those code pieces that seem to consider extra element in the struct and must be addressed:

        length = sizeof (struct dk_gpt) +
            sizeof (struct dk_part) * (nparts - 1);

and

#define SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb) \
        (SA_HDR_SIZE(hdr) == (sizeof (sa_hdr_phys_t) + \
        (tb->lot_var_sizes > 1 ? P2ROUNDUP((tb->lot_var_sizes - 1) * \
        sizeof (uint16_t), 8) : 0)))

There may be more, but I just haven't looked deeper.

@ThomasLamprecht
Copy link
Contributor Author

ThomasLamprecht commented Oct 27, 2023

Oh sure, that I even noticed that -1 there, but somehow it went over my head, thanks for the pointer; will take another look with those in mind.

The struct dk_gpt shrinkage is a clear-cut, but for sa_hdr_phys_t in your second example I'm wondering if the struct should get an __attribute__((packed)) to keep space savings, but this might not help with making things easier to understand – what do you think?

@ThomasLamprecht
Copy link
Contributor Author

ThomasLamprecht commented Oct 27, 2023

Hmm, OK I re-read up on flexible array members (FAM), and tbh. my C times are a bit in the past and I have spearheaded this a bit too quickly due to ignoring that, sorry.

For sa_hdr_phys_t I got a bit confused, because my assumption was that the (padding) space up to the FAM isn't included in the sizeof of that struct, and that seems to hold up, but offsetof(sa_hdr_phys_t, sa_lengths) is the same (6) for both structs variants, i.e., the one with a fixed size-one array and the one with FAM, but that seems to be in line with the standard:

ISO/IEC 9899:1999 (TC2) §6.7.2.1 ¶16:

15 Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in which
they are declared.

17 There may be unnamed padding at the end of a structure or union.

18 As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a flexible
array member. In most situations, the flexible array member is ignored. In
particular, the size of the structure is as if the flexible array member were
omitted except that it may have more trailing padding than the omission would
imply. However, when a . (or ->) operator has a left operand that is (a
pointer to) a structure with a flexible array member and the right operand
names that member, it behaves as if that member were replaced with the
longest array (with the same element type) that would not make the structure
larger than the object being accessed; the offset of the array shall remain
that of the flexible array member, even if this would differ from that of the
replacement array
. If this array would have no elements, it behaves as if it
had one element but the behavior is undefined if any attempt is made to
access that element or to generate a pointer one past it.

So I think that the offsetof calculation should be switched by a sizeof one, as the replacement array will always be at the end, e.g., in a calculation like:

data_start = (void *)P2ROUNDUP(((uintptr_t)hdr +
    offsetof(sa_hdr_phys_t, sa_lengths) +
     (sizeof (uint16_t) * tb->lot_var_sizes)), 8);

@amotin
Copy link
Member

amotin commented Oct 27, 2023

So I think that the offsetof calculation should be switched by a sizeof one, as the replacement array will always be at the end, e.g., in a calculation like:

it is not my strongest area, but IMHO both should be identical.

@ThomasLamprecht
Copy link
Contributor Author

After a night of sleep and some coffee it's obvious (again) that you're right, I'll try to rethink things before shooting from my hips again, my C time are simple a bit to far in the past to do so.

For the quoted code data_start even must stay the same, otherwise something would be off.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 30, 2023
@ednadolski-ix
Copy link
Contributor

I have encountered this issue on 6.5 as well, and this patch has cleared it up for me.

@lundman
Copy link
Contributor

lundman commented Nov 1, 2023

Is this something uint16_t sa_lengths[1] __attribute__((no_sanitize("undefined"))); can fix?

@tonyhutter
Copy link
Contributor

@lundman I'm going to test that out (unless someone beats me to it!)

Use the modern, and more common C99 syntax to declare flexible array
members instead of the fixed-size array with a length of
one. While the latter works in C just fine, with some extra handling
for the extra space in length/array-size calculations, it's making it
impossible to distinguish an out-of-bound access fro a variable array
access.

Why now? Linux 6.5 made it's Undefined Behavior Sanitizer (UBSAN)
integration more strict with commit 2d47c6956ab3 ("ubsan: Tighten
UBSAN_BOUNDS on GCC"), causing quite some oopses with "UBSAN:
array-index-out-of-bounds" errors logged on boot when that kernel is
used and UBSAN_BOUNDS check is enabled.

The change to modern VLA syntax does not affect the data layout
whatsoever, but can result in a smaller container size as reported by
the `sizeof` operator, basically the size reported by `sizeof (struct
foo)` will be now the same as the one from `offsetof(struct foo,
flexible_array_member)`.

So we need to carefully check all call-sites that use the changed
struct in sizeof calculations, especially when calculating the full
data length the base-struct plus the data in the flexible array
occupy, because previously one had to bias that by -1 previously, as
the fixed-sized array used to model flexible arrays, provided space
for one in the struct directly.

I tried to checked each changed struct for potential breakage, some
short notes (size differences are always for x86_64 with gcc 12, but
as types are consisting of fixed bit width ones it shouldn't matter):

- dk_gpt_t is an internal struct and doesn't seem to be written
  directly to disk as is. But, there are a few places where the length
  calculation needs to be adjusted, as struct dk_gpt doesn't holds a
  fixed extra element anymore that previously had to be accounted for.

- sa_hdr_phys_t stays the same size due to padding, but the code using
  its sizeof value for overall length calculation needs to drop the -1
  bias for the length, e.g., when multiplying to get total size.

- mzap_ent_t shrinks from 128 to 64 bytes, it is not used in sizeof
  calculations anywhere FWICT.

- zap_leaf_phys_t shrinks from 56 to 48 bytes but isn't used in sizeof
  calculations, and the l_hash entry count is calculated explicitly
  from the block-size / 2^5 (= 32).

Link: openzfs#15145
Signed-off-by: Thomas Lamprecht <[email protected]>
@tonyhutter
Copy link
Contributor

I put out an alternate PR that disables CONFIG_UBSAN for the files that use those cursed size-1-arrays at the end of structs: #15510

Note: it doesn't do anything for the EFI files since those aren't built into the kernel.

@lundman turns out __attribute__((no_sanitize("undefined"))) has to be added to the deceleration of any function that access the cursed arrays, and there's a lot of functions that call them...

@lundman
Copy link
Contributor

lundman commented Nov 10, 2023

Ah interesting. I wonder what Microsoft will do, as ALL their structs have [1] in them, hundreds :)

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasLamprecht thanks for iterating on this. I think the approach you've taken here is what we're going to need to do to get UBSAN working long term. Since these are on disk structures we're going to need to be careful are take our time. In the meanwhile, it'd be great if you could test out the workaround in #15510 which suppresses the checks.

@@ -265,7 +265,7 @@ struct sa_handle {

#define SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb) \
(SA_HDR_SIZE(hdr) == (sizeof (sa_hdr_phys_t) + \
(tb->lot_var_sizes > 1 ? P2ROUNDUP((tb->lot_var_sizes - 1) * \
(tb->lot_var_sizes > 1 ? P2ROUNDUP(tb->lot_var_sizes * \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how much padding was added to the sa_hdr_phys structure when sa_lengths[1] we may need to add a special case for tb->lot_var_sizes == 0 here. This is an on-disk struct and we need to be careful that the updated SA_HDR_SIZE_MATCH_LAYOUT returns the same sizes as the old code would have. Changing the conditional might be enough depending on the old padding.

Suggested change
(tb->lot_var_sizes > 1 ? P2ROUNDUP(tb->lot_var_sizes * \
(tb->lot_var_sizes > 0 ? P2ROUNDUP(tb->lot_var_sizes * \

Copy link
Contributor

@ryao ryao Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to tb->lot_var_sizes > 0 appears to be sufficient to fix this. I agree that it is necessary.

@ThomasLamprecht
Copy link
Contributor Author

ThomasLamprecht commented Nov 11, 2023

@behlendorf Thanks for your comments! I actually wanted to write a short message about my update when I force-pushed, but I got side-tracked too much, sorry.

Anyhow, my last push is still not complete and leads to actual boot-hangs (hung task on txg_quiesce), and yes I definitively agree on doing this slowly.

I plan to also split up my PR into a separate commit per struct change, and possibly then also a separate PR (depending on what you favor?), that way we could get each change to a struct in separately once deemed ready.

I now also tested the workaround from #15510 and it works fine, like I commented there.

behlendorf pushed a commit that referenced this pull request Nov 13, 2023
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from #15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue #15145
Closes #15510
ixhamza pushed a commit to truenas/zfs that referenced this pull request Nov 20, 2023
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from openzfs#15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue openzfs#15145
Closes openzfs#15510
Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised to see this considering that I had made an attempt to eliminate these from the codebase using coccinelle earlier this year:

9c8fabf
8e7ebf4

I skimmed this and while it looks fine for the most part, I did not give the sa changes enough attention to approve this.

@@ -265,7 +265,7 @@ struct sa_handle {

#define SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb) \
(SA_HDR_SIZE(hdr) == (sizeof (sa_hdr_phys_t) + \
(tb->lot_var_sizes > 1 ? P2ROUNDUP((tb->lot_var_sizes - 1) * \
(tb->lot_var_sizes > 1 ? P2ROUNDUP(tb->lot_var_sizes * \
Copy link
Contributor

@ryao ryao Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to tb->lot_var_sizes > 0 appears to be sufficient to fix this. I agree that it is necessary.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Nov 28, 2023
@behlendorf
Copy link
Contributor

Closing for now since #15510 is in place. @ThomasLamprecht feel free to open a new PR to resolve these when you have time to work on it again.

@behlendorf behlendorf closed this Dec 5, 2023
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from openzfs#15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue openzfs#15145
Closes openzfs#15510
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jan 26, 2024
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from openzfs#15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue openzfs#15145
Closes openzfs#15510
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jan 29, 2024
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from openzfs#15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue openzfs#15145
Closes openzfs#15510
behlendorf pushed a commit that referenced this pull request Feb 5, 2024
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from #15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue #15145
Closes #15510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux 6.5 compat: UBSAN now complains about flex-array declarations with array[1]
7 participants