Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anti-Aliasing Bug (Initialize pre-generated message, then modify it) #274

Closed
RafaelLaya opened this issue Mar 7, 2024 · 23 comments
Closed

Comments

@RafaelLaya
Copy link

bug_for_flatcc.zip

Hi everyone! This is my first bug report contribution into the FlatCC project. Thanks for making such an awesome project! Here are the details:

Files attached

  • build.sh has the build command to repro (define $GCC_PATH)
  • version.txt has GCC version (latest none-eabi) -- I was very close to filing a GCC bug until I turned on -fno-strict-aliasing and realized the issue is with the aliasing in FlatCC, not with the compiler
  • tempfiles.c The source code, after the preprocessor pasted the generated FlatCC code. Note the suffix "v2" is added by me by hand before posting here, not generated
  • out.lst The dissasembly

Explanation of the source code provided in tempfiles.c

The code in question is in the function interface_encoding_populate_general_rsp(). writes a message determined at compile-time into an array in the stack, and then finishes the message and copies into a buffer given by the caller.

If one steps through the code, essentially the following happens:

  • Response_as_root_v2() reads the word at msg_array[0] which should be equal to 12 and returns &msg_array[12]
  • Response_data_v2() reads the word at msg_array[12] which should be equal to 8 and then finds &msg_array[12-8]=&msg_array[4]. Then reads half-word at msg_array[4] and checks whether it is >=8. If less, it asserts

From the initialization of msg_array we can see everything should be initialized correctly, then Response_as_root_v2() and Reponse_data_v2() will work exactly as described above.

Actual Issue

The compiler smartly determines that it needs to look at msg_array[4] and check if it is less than 8. However, it reads msg_array[4] into R5 which has garbage in the stack. Then it initializes msg_array in the stack, and then compares R5 with 7 (branches to assert if less-or-equal). The problem is that by the time the comparison is done, the array is partly initialized and R5 has garbage from the stack before the initialization of the array

a: f8bd 5004 ldrh.w r5, [sp, #4]
e: cc0f ldmia r4!, {r0, r1, r2, r3}
10: e8ac 000f stmia.w ip!, {r0, r1, r2, r3}
14: cc0f ldmia r4!, {r0, r1, r2, r3}
16: 2d07 cmp r5, #7

Schema

The relevant portion looks like this: A table that requires an union inside, and the union has one of many tables

table MyTableResponse{
// Some integers
}
union ResponseData {
// A bunch of tables
/// Generic response (success w/ no payload or error).
MyTable: MyTableResponse,
}

table Response {
data: ResponseData (required);
}

Background

I found this by Aardappel which seems to be an user on stackoverflow who is highly involved with the project:
https://stackoverflow.com/questions/24330925/does-flatbuffers-avoid-strict-aliasing-somehow

Then in theory, if you first construct a FlatBuffer and then immediately read it, it could optimize across the writing and reading code, and do "evil" optimisations of the kind Linus was referring to in your link above (pretend the writing never happened).

Workarounds

  • Add a compiler barrier
  • Make things volatile and cast them before passing them into flatcc API's
  • Use lower optimization levels
  • Use -fno-strict-aliasing
  • A bunch of other potential compiler tricks
@RafaelLaya
Copy link
Author

Actually taking again a look at this, this line looks dangerous:

vt__tmp = (flatbuffers_voffset_v2_t*)((uint8_t*)(t__tmp)-__flatbuffers_soffset_read_from_pe_v2(
    t__tmp));
    
  The table is uint8_t* but we cast into a uint16_t* type, so perhaps this is where the compiler assumes it can't alias? 

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 7, 2024

OK, I'd like to look into this issue if there is anything I can do. So I don't want to reject this up front. But I am not convinced that this is a FlatCC issue.

I cannot easily reproduce your project because I do not have the tool chain, nor necessarily the hardware, you are using.
What I need is a schema and C source that can be compiled, and which can trigger warnings about aliasing, possibly by adding more flags to the clang compiler that I use on macOS. Alternatively that you do the same on your platform and list the warnings. If warnings cannot be generated, I'm inclined to assume it is a compiler optimization bug.

Now, given that I don't have enough information, I can speculate:

First, as Wouter AardAppel writes in the link you reference, reading is by definition read only, so aliasing should not result in different data unless you have UB (undefined behaviour) in the implementation.

I am not sure where the problem is exactly, other than it sounds like a vtable lookup issue, but if so, it likely happens many other places for similar reasons, so I'll take this is a symptom, and use the line you posted in your second comment.

In C it is (relatively recently) undefined behaviour to cast a pointer to a type it is not, except if the pointer is 'char *' which is explicitly valid, and necessary, to do pointer math.

Now, this problem is mostly with accessing data the compiler generates, but here we are reading static data from a buffer, so the compiler cannot shuffle things around. Also, we are casting to uint8_t * which for all practical purposes is the same as 'char *' and any compiler that cannot handle that, should be fixed as there is too much such code around.

After doing pointer math, the code is cast to the underlying correct type, and if you do things right on your side, it is also correctly aligned in memory. However, if your buffer is not aligned properly, some architectures might run slower, issue access violations, or do other unexpected stuff.

For this reason, the code you reference is not dangerous, except if accessing memory that is not properly aligned.

And as to GCC, I have already removed support for GCC pedantic warnings because they break portable C code, and if this turns to not be easily managed in FlatCC, it may require further downgrading GCC support, e.g. removing optimizations in recommended builds.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 7, 2024

For reference below is the code that is being discussed.

I link to the "reflection" code which is generated code that is checked into the repo, and the file is same for all flatcc projects of the same version, so this is not related to the reflection schema in particular.

https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/reflection/flatbuffers_common_reader.h

Early in the file we have the macro

#define __flatbuffers_read_vt(ID, offset, t)\
flatbuffers_voffset_t offset = 0;\
{   flatbuffers_voffset_t id__tmp, *vt__tmp;\
    FLATCC_ASSERT(t != 0 && "null pointer table access");\
    id__tmp = ID;\
    vt__tmp = (flatbuffers_voffset_t *)((uint8_t *)(t) -\
        __flatbuffers_soffset_read_from_pe(t));\
    if (__flatbuffers_voffset_read_from_pe(vt__tmp) >= sizeof(vt__tmp[0]) * (id__tmp + 3u)) {\
        offset = __flatbuffers_voffset_read_from_pe(vt__tmp + id__tmp + 2);\
    }\
}

What cannot see from this is arguments being use to call this macro. If there are aliasing issues, this could be relevant.
You could try to make the macro a static inline function which should optimize the same but perhaps better isolate contexts for the compiler.

However, the macro is already being used in such inline functions, it seems overkill to add yet another layer which is why it is not.

Here is an example from the same file:

#define __flatbuffers_define_scalar_field(ID, N, NK, TK, T, V)\
static inline T N ## _ ## NK ## _get(N ## _table_t t__tmp)\
{ __flatbuffers_read_vt(ID, offset__tmp, t__tmp)\
  return offset__tmp ? __flatbuffers_read_scalar_at_byteoffset(TK, t__tmp, offset__tmp) : V;\
}\

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 7, 2024

Here is a weekly build that shows all the GCC versions being tested and pass. But that is for whatever architecture Github Actions use:

https://github.com/dvidelabs/flatcc/actions/runs/8047226724

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 7, 2024

Further: the macro I mentioned calls a function __flatbuffers_soffset_read_from_pe(t) that you example line also calls as it is the same logic.

This function is defined in the flatcc_accessors file and use union cast which is generally believed to be a valid cast, at least respected by all living compilers. It could be argued that memcpy should be used, and I did work an that, but that got rather invasive and not worthwhile. Historically, memcpy would be slow, but lately it is understood to be a cast operator by most compilers.

https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_accessors.h

You could try to play around with different cast implementations, or replace the read function.

The __flatbuffers_soffset_read_from_pe(t) function takes a pointer to an aligned 32-bit memory address and returns a signed 32-bit integer (int32_t). If the platform is big-endian, it also swaps the bytes before returning the result.

pe means protocol endian, meaning it converts between host and the protocol endianness, which is little endian for flatbuffers, but other formats can also be supports.

@RafaelLaya
Copy link
Author

RafaelLaya commented Mar 7, 2024

Thanks! You are right, I could've done a better job producing a seamless repro. Here it is RafaelLaya#1 I forked FlatCC and modified the Monster example to show this. You should be able to cd mymonster && scripts/build.sh && ./build/mymonster and it should work. Just make sure to have arm-none-eabi-gcc available in your system. It will run on your system and work fine, but it will also build for ARMv7 and produce out.lst which is the dissasembly. From the dissasembly, we can clearly see the compiler produces code where we will be reading garbage from the stack and making decisions based on this garbage (which leads to an assertion)

Thanks! Please let me know if you come to any conclusions

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

I had a look at your PR but don't currently have a tool chain to test with. I will see what I can do. Meanwhile, please consider experimenting with replacing functions in the accessors and common files discussed above. This is where a possible FlatCC fix would happen.

That said, it really looks like a compiler bug but I will not rule out anything since these things are notoriously tricky.

I cannot see any scenerio where the compiler should think it is a good idea to dereference memory it is responsible for initializing before it has initialized that memory area, which is what it does when dereferencing the stack pointer after reserving space. Of course someone could argue that the optimizer need not check for that if the code it compiles is correct, but I find that a hard sell, and the code works on numerous other platforms, also with GCC compiler.

In the PR you have a correctly aligned memory buffer that is compiler initialized.
Aside from the unfortunately named builder_t type that should have been buffer_t or something it looks OK in source.

Just to explain what happens in the buffer content and what the C source is supposed to do:
offset 0: 4, 0, 0, 0 : find the root table at offset 4 by adding to the buffer start location (which must be aligned).
Now we have get_as_root.
offset 4: negative 4 byte vtable offset. Subtract from current offset 4: 4 - (230 - 256) = 30.
offset 30: 8, 0, 9, 0, 8, 0, 4, 0 : this is the vtable in 16-bit elements
the first two bytes 8, 0 is the vtable size. The second two bytes 9, 0 is the table size of the referencing table
Table field with ID 0 is then voffset 8, 0 relative to the table at offset 4, so field ID zero starts at buffer offset 4 + 8 = 12.
Table field with ID 1 is then voffset 4, 0 at buffer offset 4 + 4 = 8

For reference: https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#example

The soffset read function discussed above fetches the offset from one field in the vtable, and the compiler is supposed to optimize that read to relative to the vtable start very efficiently. It seems too efficiently in this case.

#define TABLE_PREDETERMINED_MSG  {4, 0, 0, 0, 230, 255, 255, 255, 8, 0, 0, 0, 1, 0, 0, 0, 248, 255, 255, 255, 128, 255, 255, 255, 6, 0, 8, 0, 4, 0, 8, 0, 9, 0, 8, 0, 4, 0, 0, 0}

/* msg_buffer 
typedef struct {
    uint8_t buffer[40] __attribute__ ((aligned (8))); 
} builder_t;

int run_test(void) {
    builder_t msg = {TABLE_PREDETERMINED_MSG};
    ns(Message_table_t) msg_table_root = ns(Message_as_root(msg.buffer));
    ns(CodeTable_table_t) table_ptr = ns(Message_data(msg_table_root));
    (void)table_ptr;
    /* ... */
    return 0;
}

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

OK, I downloaded the eabi-none toolchain and tested, not the PR but the original zip project you provided as I had that at hand already, and it turns out to be rather useful for modifying the flatcc library code for testing. However, I modified the buffer type to be aligned, as you also have done in the PR branch.

I get the assembly you documented, i.e. I can reproduce.

I then modified the function that reads the soffset to find the vtable location to make it use memcpy:

static inline flatbuffers_soffset_v2_t __flatbuffers_soffset_read_from_pe_v2(const void* p) {
#if 1 /* modified version */
  flatbuffers_soffset_v2_t word;
  memcpy(&word, p, sizeof(word));
  return __flatbuffers_soffset_cast_from_pe_v2(word);
#else /* original flatcc library code */
  return __flatbuffers_soffset_cast_from_pe_v2(*(flatbuffers_soffset_v2_t*)p);
#endif
}

And with the compile flags you provided, that is -O2 enabled, I get the following, which superficially seems to be better. But the memcpy is horribly ineffecient. The code does none of the memcpy is a cast optimizations, which is also one reason I did not make that change to accessors early - too many compilers that might do that slow.

00000000 <interface_encoding_populate_general_rsp>:
   0:	b570      	push	{r4, r5, r6, lr}
   2:	4c15      	ldr	r4, [pc, #84]	@ (58 <interface_encoding_populate_general_rsp+0x58>)
   4:	b08c      	sub	sp, #48	@ 0x30
   6:	4606      	mov	r6, r0
   8:	f10d 0c08 	add.w	ip, sp, #8
   c:	ad05      	add	r5, sp, #20
   e:	cc0f      	ldmia	r4!, {r0, r1, r2, r3}
  10:	e8ac 000f 	stmia.w	ip!, {r0, r1, r2, r3}
  14:	cc0f      	ldmia	r4!, {r0, r1, r2, r3}
  16:	e8ac 000f 	stmia.w	ip!, {r0, r1, r2, r3}
  1a:	e894 0003 	ldmia.w	r4, {r0, r1}
  1e:	2204      	movs	r2, #4
  20:	e88c 0003 	stmia.w	ip, {r0, r1}
  24:	4629      	mov	r1, r5
  26:	eb0d 0002 	add.w	r0, sp, r2
  2a:	f7ff fffe 	bl	0 <memcpy>
  2e:	9b01      	ldr	r3, [sp, #4]
  30:	1aed      	subs	r5, r5, r3
  32:	882b      	ldrh	r3, [r5, #0]
  34:	2b07      	cmp	r3, #7

I would say that this isn't really an acceptable solution.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

If I do not modify the soffset reader, but change the build target:

# original flags
#FLAGS="-O2 -Wall -Wextra -ffreestanding -mcpu=cortex-m7 -mthumb -mabi=aapcs -std=c99"

FLAGS="-O2 -Wall -Wextra -ffreestanding -mthumb -mabi=aapcs -std=c99"

then I get the following which looks more sane superficially:

00000000 <interface_encoding_populate_general_rsp>:
   0:	b5f0      	push	{r4, r5, r6, r7, lr}
   2:	b08b      	sub	sp, #44	@ 0x2c
   4:	4669      	mov	r1, sp
   6:	4c0f      	ldr	r4, [pc, #60]	@ (44 <interface_encoding_populate_general_rsp+0x44>)
   8:	000b      	movs	r3, r1
   a:	0022      	movs	r2, r4
   c:	cae0      	ldmia	r2!, {r5, r6, r7}
   e:	c3e0      	stmia	r3!, {r5, r6, r7}
  10:	cae0      	ldmia	r2!, {r5, r6, r7}
  12:	c3e0      	stmia	r3!, {r5, r6, r7}
  14:	cae0      	ldmia	r2!, {r5, r6, r7}
  16:	c3e0      	stmia	r3!, {r5, r6, r7}
  18:	6812      	ldr	r2, [r2, #0]
  1a:	601a      	str	r2, [r3, #0]
  1c:	888b      	ldrh	r3, [r1, #4]
  1e:	2b07      	cmp	r3, #7

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

I also don't get early stack reference with the original build flags, but cortex-m0 or cortext-m1 instead:

FLAGS="-O2 -Wall -Wextra -ffreestanding -mcpu=cortex-m0 -mthumb -mabi=aapcs -std=c99"
#FLAGS="-O2 -Wall -Wextra -ffreestanding -mcpu=cortex-m1 -mthumb -mabi=aapcs -std=c99"
${GCC_PATH}/arm-none-eabi-gcc $FLAGS -o out.obj -c tempfiles.c

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

Regardless of who is to blame, I needed to warn about this issue in the cross compilation section.
https://github.com/dvidelabs/flatcc?tab=readme-ov-file#cross-compilation

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

I think technically this line violates strict aliasing, but adding attribute((may_alias)) does not seem to help regardless of where I put it.

return __flatbuffers_soffset_cast_from_pe_v2(*(flatbuffers_soffset_v2_t*)p);

As mentioned, memcpy seems to work but optimizes poorly.
I also tried const char *q = p; return q[0] | (int32_t)(q[1]) << 8 | ...; and this seems to optimize better, but that is a real pain to deal with every time you want to read from a memory buffer.
I also tried using a malloc buffer instead of a stack allocated array and this also seems to work better.
All of this from looking at assembly as I cannot test the target.

If there is no better way to support reading two bytes from memory without conflicting with the C compiler, I'm not sure it is worth supporting.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

Using __builtin_memcpy

static inline flatbuffers_soffset_v2_t __flatbuffers_soffset_read_from_pe_v2(const void* p) {
#if 0
  flatbuffers_soffset_v2_t word;
  memcpy(&word, p, sizeof(word));
  return __flatbuffers_soffset_cast_from_pe_v2(word);
#elif 1
  flatbuffers_soffset_v2_t word;
  __builtin_memcpy(&word, p, sizeof(word));
  return __flatbuffers_soffset_cast_from_pe_v2(word);
#elif 0
  flatbuffers_soffset_v2_t word;
  word = ((int32_t)((const char *)p)[0]) | ((int32_t)((const char *)p)[1]) << 16 |
         ((int32_t)((const char *)p)[2]) << 16 | ((int32_t)((const char *)p)[3]) << 24;
  return word;
#else
  return __flatbuffers_soffset_cast_from_pe_v2(*(flatbuffers_soffset_v2_t*)p);
#endif
}
00000000 <interface_encoding_populate_general_rsp>:
   0:	b510      	push	{r4, lr}
   2:	4c0a      	ldr	r4, [pc, #40]	@ (2c <interface_encoding_populate_general_rsp+0x2c>)
   4:	b08a      	sub	sp, #40	@ 0x28
   6:	4686      	mov	lr, r0
   8:	46ec      	mov	ip, sp
   a:	cc0f      	ldmia	r4!, {r0, r1, r2, r3}
   c:	e8ac 000f 	stmia.w	ip!, {r0, r1, r2, r3}
  10:	cc0f      	ldmia	r4!, {r0, r1, r2, r3}
  12:	e8ac 000f 	stmia.w	ip!, {r0, r1, r2, r3}
  16:	e894 0003 	ldmia.w	r4, {r0, r1}
  1a:	2228      	movs	r2, #40	@ 0x28
  1c:	e88c 0003 	stmia.w	ip, {r0, r1}
  20:	4669      	mov	r1, sp
  22:	4670      	mov	r0, lr
  24:	f7ff fffe 	bl	0 <memcpy>
  28:	b00a      	add	sp, #40	@ 0x28
  2a:	bd10      	pop	{r4, pc}
  2c:	00000000 	.word	0x00000000

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 8, 2024

I added a new branch fix-accessors with currently two commits
https://github.com/dvidelabs/flatcc/tree/fix-accessors

It adds pmemaccess.h to provide __builtin_memcpy and other methods to read and write memory.
It also updates flatcc_accessors.h to use these instead of *(T *)p casts for read and write.

Note that pmemaccess still uses pointer casts on x86/64 since it has worked so far, and memcpy might not work well on all platforms.

There is a related issue with punaligned.h that reads unaligned memory, mostly for JSON parsing. It already does something similar with pointer access on x86/64 and not on other platforms.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 9, 2024

Just for the record: @RafaelLaya commented directly on the fix-accessors commits.

Also, I'm going to do some further updates to call mem_copy_read_nn and mem_copy_write_nn directly from flatcc_accessors, instead of calling mem_copy_word directly. This will make it possible to use pointer casts like *(T *)p as before, which is probably better on older MSVC compilers and possible other targets. On ARM / gcc the effect will be the same, calling mem_read_16 -> mem_copy_word -> builtin_copy, but I need to work on it a bit more to pass tests.

The behaviour can also be controlled by -DPORTABLE_MEM_PTR_ACCESS=0 (or 1), and also by defining mem_copy_word before header inclusion. This might be relevant on some targets, since this area of operation is extremely performance sensitive, and also code size sensitive.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 9, 2024

I have now update the fix-accessors branch. I needed to add mem_read/write_float_32/64 as well. It passes std CI tests.

If you are happy with the design, can you please confirm that it solves the issue on actual -mcu=cortext-m7 -O2 compiled targets, if possible?

I still think this is a compiler issue, it is not practical to disallow C code with simple pointer memory access, but at least we appear to have a workable solution that is not too ugly.

@RafaelLaya
Copy link
Author

RafaelLaya commented Mar 9, 2024

Sorry I reviewed last night and forgot to respond here. I like that you added customizability of mem_copy_word and yes I'm happy with the design. I'm sure from the assembly it should be fixed now, but I'll try to check on a target just in case as you suggested

I still think this is a compiler issue, it is not practical to disallow C code with simple pointer memory access, but at least we appear to have a workable solution that is not too ugly.

I do agree the compiler is being extremely aggressive here, undesirably so

I didn't feel like I had a strong case to file a gcc bug report, since the first thing they have in bold letter is a warning that if fno-strict-aliasing changes the behavior, then it's most probably on the programmer

Something that I can't seem to ignore is that we are first initializing the buffer and then passing it into the as_Root() function and then as_Data(). So regardless of any aliasing issues inside as_Root() or as_Data(), it should at the very least finish initialization before doing the calls. Wouldn't you agree?

Meaning, I think the compiler has the right to provide UB inside these calls since strict-aliasing is not respected. However, I feel like outside the call, the initialization should've finished before doing the calls that require the buffer as a parameter (regardless of inlining)

@RafaelLaya
Copy link
Author

Update: Tested on an M7 Target. The issue is gone now and didn't observe any regressions :) -- Thanks a lot!

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 10, 2024

Thanks for testing. I will merge this.

I like that you added customizability of mem_copy_word

It is a core feature of flatcc to isolate platform dependencies in the portable library, and to run well on many targets, so it is a natural extension of other similar code, like endian conversion and aligned allocation.

Something that I can't seem to ignore is that we are first initializing the buffer and then passing it into the as_Root() function and then as_Data(). So regardless of any aliasing issues inside as_Root() or as_Data(), it should at the very least finish initialization before doing the calls. Wouldn't you agree?

I agree. If you read earlier comment of mine I write that I cannot see any scenario where the compiler should dereference uninitialized stack content it is responsible for.

Having written some compiler backend logic, I can speculate that the compiler injects its own initializaition into the execution graph, then does liveness checking (dependency tracking), then culls any dependency links that violates strict aliasing, then finds two parallel root executation paths that are equally good: read from stack, or initialize stack.

However, this has less to do with optimization of the final output and more to do with how the compiler goes around achieving this.

@mikkelfj
Copy link
Contributor

Merged. I updated notes in CHANGELOG 0.6.2, README cross compilation section.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 12, 2024

For the record, the following two commits on master implemented the fix:

pmemaccess.h:
c994c49

flatcc_accessors.h:
cc6e6e5

The old (by Intel) deprecated Intel ICC compiler stopped working after these changes unless -O3 is dropped to -O2. We are not sure why, and might just deprecate it from FlatCC weekly builds, but for the record, the only other change between successful and broken weekly build is an update to pstdbool.h: 81d4ba6

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 12, 2024

ICC build fixed in commit: 7625a37
It just defaults to not using pointer casts for that compiler. It will use __builtin_memcpy.
It was same issue as this issue, but since we by default chooses non-strict pointer casts on x86/64, and ICC is sensitive to that on -O3, it triggered a segfault. This could just as well have happened earlier on, but was triggered by the very changes that were supposed to fix it, randomly.

A related commit just cleans up bocus ICC warnings: 56d2559

@mikkelfj
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants