Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can CFI be made compatible with type erasure schemes? #128728

Open
RalfJung opened this issue Aug 6, 2024 · 19 comments
Open

Can CFI be made compatible with type erasure schemes? #128728

RalfJung opened this issue Aug 6, 2024 · 19 comments
Labels
A-sanitizers Area: Sanitizers for correctness and code quality C-discussion Category: Discussion or questions that doesn't represent real issues. PG-exploit-mitigations Project group: Exploit mitigations T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

There is a kind of common type erasure scheme used in Rust that goes something like this:

/// Invariant: there exists some type `T` such that `data` is actually a `&'a T` and `op`
/// is actually a `fn(&T)`.
struct ErasedTypeAndOp<'a> {
  data: *const (),
  op: fn(*const ()),
  _phantom: PhantomData<&'a ()>,
}

impl<'a> ErasedTypeAndOp<'a> {
  pub fn new<T>(data: &'a T, op: fn(&'a T)) {
    Self {
      data: data as *const T,
      op: unsafe { std::mem::transmute(op) },
      _phantom: PhantomData,
    }
  }

  pub fn call_op(&self) {
    (self.op)(self.data)
  }
}

This is used, in particular, in the standard library for type-erased fmt arguments.

Unfortunately, CFI is not happy with this since the function pointer op was transmuted, and is not invoked at its original type. Our documented ABI compatibility rules are fine with this (and Miri also won't complain), but CFI is actually more restrictive than those rules and (at least in some configurations) rejects this call since caller and callee do not agree on the type of the argument.

This is not good: ideally we could just tell people they can turn on CFI in any Rust program and expect it to work. In other words, ideally CFI would only reject programs that we consider buggy (in an official Rust lang/opsem sense), since they have either UB or erroneous behavior.

I am not sure what is the best way to fix this. I also know very little about what CFI can and cannot do (and I understand there's actually a bunch of CFI implementations that differ in their capabilities). My completely naive first idea would be to say that we have a new magic primitive type Erased and then declare ErasedTypeAndOp as follows:

struct ErasedTypeAndOp<'a> {
  data: *const Erased,
  op: fn(*const Erased),
  _phantom: PhantomData<&'a ()>,
}

Then we say that if caller and callee use different pointer types for their arguments, that is erroneous behavior unless one of them uses the special Erased pointee type, in which case the call is permitted. (Or maybe only the call site is allowed to use Erased like that?)

Is that something CFI can do -- basically have pointee type checking "turned off" if the call site uses *const Erased? If yes, would this be an acceptable trade-off between still rejecting accidental type mismatches (and catching as many attacks as possible) while accepting legitimate type erasure patterns?

Cc @rust-lang/opsem @maurer @rcvalle

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 6, 2024
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. PG-exploit-mitigations Project group: Exploit mitigations T-opsem Relevant to the opsem team A-sanitizers Area: Sanitizers for correctness and code quality and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 6, 2024
@VorpalBlade
Copy link

Then we say that if caller and callee use different pointer types for their arguments, that is erroneous behavior unless one of them uses the special Erased pointee type, in which case the call is permitted. (Or maybe only the call site is allowed to use Erased like that?)

Wouldn't this be a breaking change? And it isn't a soundness or security fix, so I don't see h it being allowed by that rule.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2024

Yeah we'd have to evaluate whether making the ecosystem CFI-compatible is possible without undue breakage. OTOH if we don't do this CFI will always be a second-class citizen as a sanitizer, since it will keep rejecting completely valid code.

@VorpalBlade
Copy link

The other option is to make CFI Rust compatible. Why is that not being considered as a solution? I see no reason why Rust and C/C++ should use exactly the same variant of CFI, since they are different languages with different UB/EB.

For the same reason I would expect yet other languages to need other variations of CFI as well (though I don't know them in enough detail to come up with any specific examples).

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2024

The other option is to make CFI Rust compatible. Why is that not being considered as a solution?

It is being considered. I am talking to various stakeholders and the CFI folks are telling me that accepting arbitrary pointee type mismatches makes CFI "essentially worthless". I'll let them explain why that is the case.

@ChayimFriedman2
Copy link
Contributor

ChayimFriedman2 commented Aug 6, 2024

It is possible to fix the example without fn transmutes (although I don't know if this will be good enough for CFI) by introducing a shim:

use std::marker::PhantomData;

/// Invariant: there exists some type `T` such that `data` is actually a `&'a T` and `op`
/// is actually a `fn(&T)`.
struct ErasedTypeAndOp<'a> {
    data: *const (),
    op: fn(*const ()),
    _phantom: PhantomData<&'a ()>,
}

impl<'a> ErasedTypeAndOp<'a> {
    pub fn new<F: Fn(&'a T), T>(data: &'a T, _op: F) -> Self {
        const { assert!(std::mem::size_of::<F>() == 0) };
        fn op_shim<'a, T: 'a, F: Fn(&'a T)>(data: *const ()) {
            unsafe {
                let op = std::mem::zeroed::<F>();
                op(&*data.cast::<T>())
            }
        }
        Self {
            data: data as *const T as *const (),
            op: op_shim::<T, F>,
            _phantom: PhantomData,
        }
    }

    pub fn call_op(&self) {
        (self.op)(self.data)
    }
}

There are other variants of this, such as introducing a trait.

This isn't as flexible as transmuting the fn pointer, but I think it fits most of the cases. So if we can break compatibility (especially considering that the docs about ABI compatibility weren't stable for a long time, only since 1.76.0) we can tell people to use that approach.

It seems a somewhat-safe wrapper (providing unsafe fn(*const ())) for this pattern can be made into std, but maybe this will require variadics to be fully generic.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2024

Ah right, if you take the function item as input you can do this without extra allocations. You could try this approach with std::fmt to see how it does in practice.

(especially considering that the docs about ABI compatibility weren't stable for a long time, only since 1.76.0)

We didn't document anything before, but this has been de-facto relied upon since Rust 1.0 by std itself and hence likely was picked up by other crates during the years.

Also, CFI is not a super widely used sanitizer technique (to my knowledge), so I am hesitant to force everyone to adopt such quirky patterns... it would be better if the sanitizer could be improved to incur less collateral damage on legit code.

@ChayimFriedman2
Copy link
Contributor

ChayimFriedman2 commented Aug 6, 2024

We didn't document anything before, but this has been de-facto relied upon since Rust 1.0 by std itself and hence likely was picked up by other crates during the years.

Personally I have always used a shim until this was documented. std can use things that are not guaranteed sound. But I can understand others were relying on this.

What about declaring it is not UB, but recommending developers to switch to alternatives (perhaps adding something like Erased too)?

Also, is there any way we can automatically detect crates using such techniques and lint against them when trying to apply CFI?

@maurer
Copy link
Contributor

maurer commented Aug 6, 2024

Dumping some context onto this discussion - I'll write a separate post with design/opinion statements, this is intended just to provide a big infodump.

While my document there has a bunch of CFI background, I've heard from some folks that they glossed over anything at the implementation level, and would really like a description of what policies could be implemented by each system, so I'm going to write a super-reduced summary here:

What is CFI?

CFI is a general class of sanitizers that is used to harden the program against attacker control flow hijacks by making sure that computed transfers (calls to function pointers, closures, trait objects, returns) are going to a site that would have been statically possible in the source program. We narrow further to forward-edge CFI, which is everything other than "returns" - returns are generally protected by other mechanisms that we aren't discussing here.

CFI Implementations

While there are a bunch of CFI implementations, there are two we have unstably implemented in Rust, and they're the most commonly used in the outside world. Both of these are "type-based" CFI in their C/C++ variant, meaning that they address the aliasing problem by asserting that function pointers are only compatible if their type is the same.

LLVM CFI

This is commonly just called "CFI", and if someone says they've enabled "CFI", this is probably what they mean.

Requirements

  • Full program LTO (including x-lang, so matching LLVMs are required)
  • Tolerate function pointer equality breaking across any non-static linkage boundary
  • Must use clang

Policy Power

  • Tag functions or read-only data structures with any number of tags at compile time
  • Efficiently check at function call-site or data structure dereference site for whether any tag associated matches.

Operation in C/C++

  • Functions and methods are tagged with type signatures...
    • Methods may be callable with multiple types (due to the this parameter being compatible upwards on the inheritance tree if the method is present), and are then tagged with all of them
    • These types also have variants created. Variants can be checked for separately, as they are concatenated onto the end of the tag. The two main variants today are:
      • normalized - this is the variant we had them add for Rust, where e.g. unsigned int would become u32 or u64 depending on the platform
      • generalized - Pointers are collapsed into just their qualifiers, so const *T is equivalent to const *U for any T and U, for example.
  • Functions get checked at the call-site for whether their expected type, in the selected variant for the compilation unit, is available on that target.
  • Vtables are tagged with the classes that they are valid for
  • Vtables get checked when loading a function for whether their expected class is in the list

KCFI

This is a variant scheme for CFI that uses only one tag value, intended for embedded applications. It is actively used in the Linux kernel and firmware development. For example, every Android phone shipped today has KCFI enabled. KCFI is also supported by GCC, not just clang.

Requirements

  • No XOM
  • Code sections are not read-write

Policy Power

  • Tag functions with exactly one tag
  • Efficiently check that tag at a function call-site

Operation in C (C++ not supported)

  • Tag all functions with their function signature
  • Check the tag at the call-site against the expected signature

Rust Today

We do have CFI implemented in Rust today, and it mostly works.

Operation

  • Functions have two possible baseline types - the concrete type, which is the actual type of the function, and the erased type, which will replace any references to Self with an appropriate dyn Trait.
  • For CFI, every function will be tagged with both baseline types, modified by any enabled variants.
  • In KCFI, one variant is selected for the entire compilation unit.
  • In KCFI, for functions which will be present in a vtable, the Self-erased type will be tagged on to the function. In the event that a function pointer is created from the vtable, a shim will be created that is tagged with a concrete Self.
  • In KCFI, for functions which will not be present in a vtable by default, the Self-concrete type will be tagged onto the function. If we need an abstract version (the usual reason being to reify a function pointer into a closure), a shim will be generated.

Cross-language support

  • FFI-safe types encode to the same representation as in C/C++ (assuming .normalized)
  • The same variants are available (if C/C++ uses .generalized, we can too)

Flaws

  • The drop_in_place implementation is currently extremely broad. This could potentially be improved for LLVM CFI via vtable validation. KCFI would need a more complex approach to improve this.
  • The aliasing model assumed by CFI in C/C++ does not match the function pointer aliasing model in Rust. We assumed this aliasing model in the Rust CFI implementation anyways, which results in some patterns (like the type erasure from this issue, inspired by core::fmt) being ruled illegal by CFI which are defined behavior for Rust.

@maurer
Copy link
Contributor

maurer commented Aug 6, 2024

OK, now for my opinions.

In general, I think that ideally CFI should follow the language's rules. That is what it's supposed to do - provide an extra set of guard-rails at runtime that prevents the running program from engaging in some behaviors that are outside of the program's expected behavior. Do note that before CFI ever becomes helpful, UB has to have occurred somewhere. The goal is to make that UB less practically devastating when it happens by making it harder for an attacker to weaponize. Effectively, this means that an idealized/abstract implementation of CFI is a transformation that removes some of the worst behaviors from the set of possibilities when UB is hit. 1

That said, even in the case of existing CFI, some compromises needed to be made. LLVM CFI loses function pointer equality between dynamically loadable units. C technically supports casts between "compatible" functions (much narrower set of types than what Rust ABI-matching casts allow), but you're not allowed to do this when either LLVM CFI or KCFI are enabled. I think there's a case for solutions where some legal code is disallowed with CFI enabled, especially if those patterns are rare and statically detectable.

Possible Resolutions

core::fmt is used all over the place, and it's likely this scheme is as well. I agree we should do something about this, but we've got a lot of options. Throughout the solutions, it's important to remember that just because Rust is safer than C does not mean that a Rust call-site is less attacker exploitable. It is likely that once a vulnerability is found in C code, it can be used to write data that will be trusted by Rust code.

I haven't run numbers on this yet, but ABI compatibility equivalence sets are extremely broad, especially for low-arity functions. If exact quantification would be helpful, we could try running the experiment to see how expanded the alias sets would be if we could decide on a representative example.

CFI/KCFI Specific Codepaths

This is what the original PR linked to this issue #115954 was attempting to do - essentially, when CFI or KCFI are enabled, adjust the code so that it's CFI-legal rather than just Rust-legal.

Advantages

  • Simpler
  • Keeps being doable with most language changes

Disadvantages

  • Needs to keep being done to every new instance of the pattern
  • Difficult to know whether an ecosystem crate is CFI-legal

Enforce only Rust aliasing restrictions

Rust has much weaker aliasing restrictions. These look similar to a .normalized.generalized variant type. We could err on the side of enforcing only what the language demands by checking that variant on calls starting from Rust. There are two major points of complexity here:

  • Function pointers passed to other languages need to be tagged with the more restrictive types as well
    • Users will need to be aware that if they perform a function pointer cast that is ABI safe in Rust, the resulting pointer may not be safe to pass to C. It was not safe without CFI, but they would likely have "gotten away" with it, and will not any more, even though they would get away with accepting it in a Rust function.
    • KCFI supports only one tag. This means that either we need to weaken the other language's CFI model to Rust's, or we need to somehow attach a shim every time a function pointer is passed across the barrier. This is tricky because at the barrier we only have a pointer, not a concrete function we could easily generate a shim for. We might be able to do something with a data table or careful binary layout, but this would be tricky.
  • .generalized.normalized is not actually identical to Rust ABI safety - for example, *const () and *mut () do not generalize to the same type - the qualifiers are different, and the proposed new enum #[repr({u,i}?)] rule is not followed.
    • For internal functions, we could have a new variant that actually transformed those down
    • We would be unable to receive external functions without convincing the C compilers to add this new form of even broader generalization as a variant.

Advantages

  • If implemented correctly, accepts every legal Rust program

Disadvantages

  • Weakens CFI internally to Rust
  • If KCFI is in use, either weakens CFI in any C companion code or adds the kind of complexity KCFI users were running from
  • If CFI is in use, requires upstreaming code to clang in order to take any function pointers from C.

erased fn

The basic concept here is that we remove 2 the ability to do "ABI-compatible" fn casts from Rust. Instead, we also have erased fn, which works exactly like fn, except that it can be legally cast to any ABI compatible erased fn type. An erased fn can be produced from any singleton-kind function. In CFI, this works by tagging any function which ever gets converted to an erased fn with a .rust_erased special variant that normalizes the args and return value appropriately. In KCFI, this works by creating a shim at the cast site from singleton function to erased fn with the same tag that CFI would add to its encoding. At call-sites in either, you check for the .rust_erased special variant. erased fn could not be passed across the language boundary in KCFI.

Advantages

  • Makes it explicit to the reader when functions are being intentionally called in a way that implicitly transmutes their arguments
  • Only weakens Rust CFI where users need the feature

Disadvantages

  • User-facing language change means significant complexity to support a sanitizer

Call-site annotation

We could annotate call sites that believe they are using a potentially erased pointer to indicate that CFI should be broadened here. LLVM CFI would attach .rust_erased to all functions, and we would check that variant at those call-sites. KCFI would need to create a shim for every function, and use some kind of table or binary-layout based mechanism to be able to compute the shim from the function pointer when calling.

Advantages

  • User-level patching is simpler than with the existing #[cfg] + witness solution, and no new concepts like unsafe fn
  • In the worst case, KCFI could treat this as a #[no_sanitize] tag for the call-site.

Disadvantages

  • Still requires code changes to legal Rust to run with a sanitizer
  • Forces KCFI to either be weakened or engage in whole-program-layout operations that the approach was trying to avoid
  • Weakens Rust CFI

Just Stop 🛑

Do we even need this? Is the actual solution just better optimization?

The main situation where I see this used is to avoid closure construction. I'd be interested if someone knows of a case where we have a

struct Foo<'a> {
  arg: *const (),
  cb: unsafe fn(*const (), i32),
  _lifetime: PhantomData<&'a ()>,
}

that would not be equivalently well represented as &'a Fn(i32) other than for performance. My understanding is that the reason for this is that a &'a Fn(i32) will be packaged as:

  1. Pointer to environment
  2. Pointer to the Fn vtable

which means that essentially it has a double-indirection compared to Foo. However, if this optimization is the only reason, perhaps we can just optimize &'a dyn Fn() instead!

We can use dyn*-style representation so that the first argument need not be a pointer to a thing that implements the vtable, but any word-sized value. In that case, if the closure closes over a word-sized value, we just contract it to the inner value itself. This optimization would likely help in many cases where only one value is actually being closed around.

The other optimization we could perform is that if we are creating the vtable for a &dyn Trait, we can elide the drop_in_place and size from the vtable because drop_in_place is not usable on a non-mutable reference. If after this elision there would only be one item left, then we actually replace the vtable with its contents, which are now word-sized - just the method.

By their powers combined, &'a dyn Fn(i32) becomes equivalently represented to, and theoretically as-fast-as, Foo<'a>. Having removed the reason to do this, we could look into moving this into defined-but-erroneous behavior.

If there are important reasons to do this erasure for expressiveness rather than performance, this is obviously out.

Footnotes

  1. Under the assumption that the compiler is not working against us, but an attacker providing an input is. CFI obviously can't save you from the compiler deciding that appropriate behavior when UB is hit is to start an unauthenticated shell server.

  2. I know we can't just instantly change it now that the rule has been published, but it might be possible to migrate at the next edition - we could rewrite old edition fn pointers to all implicitly be erased fn in the IR, and I think it would work except in the case where the function pointer was coming in from a foreign language, in which case it might have the wrong tags. We could work out the details here if people are in favor of this solution.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 7, 2024

The basic concept here is that we remove 2 the ability to do "ABI-compatible" fn casts from Rust.

To be clear, I was suggesting this (in the form of a *const Erased type or so) just for pointer types. I was imagining other mismatches like char vs u32 or i32 vs NonZeroI32 would remain allowed without any annotation.

Do we even need this? Is the actual solution just better optimization?

I don't know what people actually use these "function calls with ABI mismatches" for. There's manual type erasure schemes like the one in fmt (I think that will mostly just be pointers with different pointee types). Beyond this I know people care about having the ABI guaranteed for things like niche-optimized Option/Result (that was rust-lang/rfcs#3391). This is probably mostly for FFI purposes: C declares an int return type where null is the error, and the Rust side wants to use Result<NonZeroI32, ()> and ensure this signature is compatible. I'm not aware of anyone using this for Rust-to-Rust calls -- but for cross-lang CFI, you would likely still care about this.

@Darksonn
Copy link
Contributor

Not that we have to follow C, but as an interesting side-note, I just came across UndefinedBehaviorSanitizer’s unexpected behavior which seems to indicate that C doesn't allow this kind of function pointer mismatch at all.

@VorpalBlade
Copy link

VorpalBlade commented Oct 21, 2024

This is probably mostly for FFI purposes: C declares an int return type where null is the error, and the Rust side wants to use Result<NonZeroI32, ()> and ensure this signature is compatible. I'm not aware of anyone using this for Rust-to-Rust calls -- but for cross-lang CFI, you would likely still care about this.

What would it even mean to do CFI cross language? Types aren't named the same. You might consider a mapping table, but that isn't straight forward. For example: i32 vs int vs int32_t for example. And does C char correspond to i8 or u8? And what about long (i32 or i64)? And that is just with primitives on common desktop platforms! In C/C++ this varies between target triplets.

Then there are the types thst don't have exact equivalents: intptr_t, size_t etc, though these are only typedef on the C side, but usize is a "real" type on the Rust side.

As soon as you leave basic primitives and start passing user defined types it gets even more complex. What is CFI even comparing here?

  • It can't be struct/union names, because that can and does vary between languages (in Rust they are not in the global name space, but defined in some crate for example, certain names might also be reserved keywords in the other language).
  • It cant be the internal shape of a struct (e.g. expand it down to primitives and compare layout compatibility):
    • Rust can't exactly represent a struct with a bit field in it.
    • We don't even know the full contents sometimes thanks to extern types (struct FILE; just being forwarded declared in C)
    • What about repr(transparent)?

To me that seems to rule out both "simple" approaches, and what is left are fuzzy "are these similar enough" comparisons that would be way too expensive to do at runtime for every call.

Newtypes (which are extremely common in Rust and I have started to see more of in modern C++ as well) makes this even more complex. And then there is the whole niche optimised Option/Result as you pointed out as well.

I'm fairly certain that I could (if I wasn't on my phone right now) sit down and come up with a set of FFI signatures pairs that should work to any reasonable programmer but where any one way of implementing CFI would break at least one of them on some architectures.

@bjorn3
Copy link
Member

bjorn3 commented Oct 21, 2024

What would it even mean to do CFI cross language? Types aren't named the same. You might consider a mapping table, but that isn't straight forward. For example: i32 vs int vs int32_t for example. And does C char correspond to i8 or u8? And what about long (i32 or i64)? And that is just with primitives on common desktop platforms! In C/C++ this varies between target triplets.

Clang has an option to normalize CFI types based on their size and signedness, so rust i32, C int and C int32_t all map to the same CFI tag if this option is enabled. And C char would map to i8 or u8 depending on if char is signed or not.

It can't be struct/union names, because that can and does vary between languages (in Rust they are not in the global name space, but defined in some crate for example, certain names might also be reserved keywords in the other language).

The struct name is exactly what is currently used as tag in Rust, just like it is in C. The expectation is that the raw bindings use the same names as C and you optionally write safe(r) wrapper around it with a name following the Rust naming conventions.

Newtypes (which are extremely common in Rust and I have started to see more of in modern C++ as well) makes this even more complex. And then there is the whole niche optimised Option/Result as you pointed out as well.

You don't need to pass newtypes across the C boundary, so that is irrelevant. And I'm not sure C++ CFI is supported by Rust at all currently. Only C CFI do I know for certain is supported.

@VorpalBlade
Copy link

The struct name is exactly what is currently used as tag in Rust, just like it is in C. The expectation is that the raw bindings use the same names as C and you optionally write safe(r) wrapper around it with a name following the Rust naming conventions.

How does that work with crate / module names? Or are those automatically stripped from the name of the type?

Also, what about reserved identifiers? While Rust has raw identifiers, C does not. So for the cbindgen case that could be a problem.

There are also things like replaces in bindgen. Looking through the documentation of cbindgen (which I haven't used myself) it also seem to support renaming types.

For any future C++ cross language CFI, renaming is going to be required even.

@Darksonn
Copy link
Contributor

It's just using the name of the struct. It's not using the mangled name or anything like that. Yes, there will be different types of the same name that collide. It's fine.

As for renames, Rust has an attribute to change the name used by a type for cfi purposes. Bindgen could generate that for renames in the future. But Rust CFI support isn't perfect yet. For now, bindgen can give you failures when C enums are involved, as bindgen may just generate an integer type alias on the Rust side. I'm sure we'll fix it in the future.

@CAD97
Copy link
Contributor

CAD97 commented Oct 21, 2024

UndefinedBehaviorSanitizer’s unexpected behavior which seems to indicate that C doesn't allow this kind of function pointer mismatch at all.

As mentioned above, C (the standard) does allow function ABI punning between certain compatible types, but C-with-sanitizers forbids this in many/most cases.

Newtypes (which are extremely common in Rust and I have started to see more of in modern C++ as well)

By the current C++ standard, a reinterpret_cast between "newtype" wrappers will usually result in UB because of the "TBAA" rules. (It's possible to do soundly if an lvalue with the wrong type is never manifested, but this is difficult to ensure and forbids the use of methods.) It will usually work anyway, thus people doing it anyway, but formally it is not permitted.

Yes, there will be different types of the same name that collide. It's fine.

And to explicitly state why: CFI is ultimately a low-cost sanitizer to turn some UB into a somewhat controlled program termination, and is more concerned with running more correct code than it is catching more incorrect code.

@RalfJung
Copy link
Member Author

Not that we have to follow C, but as an interesting side-note, I just came across UndefinedBehaviorSanitizer’s unexpected behavior which seems to indicate that C doesn't allow this kind of function pointer mismatch at all.

As CAD said, that's likely an overeager sanitizer, not UB in the code.

Though the list here does not seem to say Struct* is compatible with void*, which is surprising.

@Darksonn
Copy link
Contributor

The curl article says this:

It has been pointed out to me that the way the C standard is phrased, this tool seems to be correct.

which seems to agree that Struct* and void* are incompatible.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 22, 2024

Either way I think it is a terrible idea to make this UB. There's no justification for unleashing the Kraken on programs like that. Worst-case we should declare it erroneous behavior. But currently we actually document stably that this is allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sanitizers Area: Sanitizers for correctness and code quality C-discussion Category: Discussion or questions that doesn't represent real issues. PG-exploit-mitigations Project group: Exploit mitigations T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

9 participants