Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bogus pointer (double free?) during the bossfight in the third stage #378

Open
omar-polo opened this issue Mar 20, 2024 · 9 comments
Open

Comments

@omar-polo
Copy link

On OpenBSD, taisei 1.4.1 crashes during the bossfight in the third stage. it's probably a double free, although I can't say for sure.

Here's the logs. In this case I jumped to the third stage via "Stage Practice" in the main menu, but it crashes also if reached "normally".

% taisei
W: config>config_set: Unknown setting 'gamepad_axis_ud_free_sensitivity'
W: config>config_set: Unknown setting 'gamepad_axis_lr_free_sensitivity'
W: config>config_set: Unknown setting 'fullscreen_desktop_mode'
W: renderer/glcommon/opengl>glcommon_ext_texture_format_rg8_srgb: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_etc1: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_etc1_srgb: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_pvrtc: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_pvrtc2: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_pvrtc_srgb: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_atc: Extension not supported
W: renderer/glcommon/opengl>glcommon_ext_texture_format_fxt1: Extension not supported
taskmgr:global/3   W: resource/texture_loader/basisu>texture_loader_basisu_sanitize_levels: menu/mainmenubg: Mip level 2 dimensions are not multiples of 4 (450x
300); number of levels reduced 11 -> 2
taskmgr:global/7   W: resource/texture_loader/basisu>texture_loader_basisu_sanitize_levels: abstract_brown: Mip level 1 dimensions are not multiples of 4 (600x7
62); number of levels reduced 11 -> 1
taskmgr:global/6   W: resource/texture_loader/basisu>texture_loader_basisu_sanitize_levels: marisa_bombbg: Mip level 2 dimensions are not multiples of 4 (480x26
2); number of levels reduced 11 -> 2
E: util/kvparser>parse_keyvalue_file_cb: VFS error: Node 'res/shader/global.pp' does not exist
E: resource>load_resource_finish: Failed to load postprocessing pipeline 'global' from 'res/shader/global.pp'
E: util/kvparser>parse_keyvalue_file_cb: VFS error: Node 'res/shader/viewport.pp' does not exist
E: resource>load_resource_finish: Failed to load postprocessing pipeline 'viewport' from 'res/shader/viewport.pp'
taskmgr:global/0   W: resource/texture_loader/basisu>texture_loader_basisu_sanitize_levels: stage3/wspellbg: Mip level 7 dimensions are not multiples of 4 (8x6)
; number of levels reduced 11 -> 7
taskmgr:global/2   W: resource/texture_loader/basisu>texture_loader_basisu_sanitize_levels: stage3/spellbg2: Mip level 2 dimensions are not multiples of 4 (162x
150); number of levels reduced 10 -> 2
attack_task_helper[5bc8] W: common_tasks>common_wander: Clipping fallback  origin = 240.000000+96.160791i  dist = 200.000000  bounds.top_left = 80.000000+80.000
000i  bounds.bottom_right = 400.000000+210.000000i
attack_task_helper[5bc8] W: common_tasks>common_wander: Clipping fallback  origin = 257.504882+84.573803i  dist = 200.000000  bounds.top_left = 80.000000+80.000
000i  bounds.bottom_right = 400.000000+210.000000i
W: renderer/gl33/common_buffer>gl33_buffer_resize: Resizing buffer 5 (Lasers VB pass 1) from 16384 to 32768
W: renderer/gl33/common_buffer>gl33_buffer_resize: Resizing buffer 5 (Lasers VB pass 1) from 32768 to 65536
<indirect:Stage3PreBossDialog>[139a0] W: dialog/reimu>COTASK_reimu_Stage3PreBossDialog: TITLE(wriggle, "Wriggle Nightbug", "Insect Rights Activist") not yet imp
lemented
taisei(7371) in free(): bogus pointer (double free?) 0x10f72fc8ef0
Abort trap (core dumped)

The backtrace is:

(gdb) bt
#0  thrkill () at /tmp/-:2
#1  0xe0ba53a7c06d8432 in ?? ()
#2  0x0000010f922344b2 in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
#3  0x0000010f921ef60e in wrterror (d=0x10f296eb658, msg=0x10f9219d979 "bogus pointer (double free?) %p") at /usr/src/lib/libc/stdlib/malloc.c:378
#4  0x0000010f921f5806 in findpool (p=0x10f72fc8ef0, argpool=0x200282, foundpool=0x10f72fc8dc0, saved_function=0x10f72fc8dc8)
    at /usr/src/lib/libc/stdlib/malloc.c:1594
#5  0x0000010f921f0751 in ofree (argpool=0x10f72fc8e30, p=0x10f72fc8ef0, clear=0, check=0, argsz=0) at /usr/src/lib/libc/stdlib/malloc.c:1608
#6  0x0000010f921f06e3 in _libc_free (ptr=0x10f72fc8ef0) at /usr/src/lib/libc/stdlib/malloc.c:1747
#7  0x0000010cc2ad1b4e in libc_free (ptr=0x0) at ../taisei-1.4.1/src/util/consideredharmful.h:82
#8  mem_free (ptr=0x0) at ../taisei-1.4.1/src/memory/memory.c:35
#9  _dynarray_free_data (darr=0x10f72fc8e38, sizeof_element=<optimized out>) at ../taisei-1.4.1/src/dynarray.c:24
#10 coevent_cancel (evt=0x10f72fc8e38) at ../taisei-1.4.1/src/coroutine/coevent.c:130
#11 0x0000010cc2ad1f58 in _coevent_array_action (events=<optimized out>, num=<optimized out>, func=<optimized out>)
    at ../taisei-1.4.1/src/coroutine/coevent.c:138
#12 cotask_finalize (task=0x10fa89354d0) at ../taisei-1.4.1/src/coroutine/cotask.c:271
#13 0x0000010cc2ad1ebd in cotask_entry (varg=0x0) at ../taisei-1.4.1/src/coroutine/cotask.c:376
#14 0x0000010cc2c1ddd2 in koishi_entry (co=0x10fa89354e0) at ../taisei-1.4.1/subprojects/koishi/src/fcontext/../fiber.h:68
#15 0x0000010cc2c1ddba in co_entry (tf=...) at ../taisei-1.4.1/subprojects/koishi/src/fcontext/fcontext.c:50
#16 0x0000010cc2a8369f in make_fcontext () at ../taisei-1.4.1/subprojects/koishi/src/fcontext/asm/make_x86_64_sysv_elf_gas.S:71
Backtrace stopped: Cannot access memory at address 0x10f72fca000

I'm not sure if it crashed in 1.4.0 too. Before updating the package last time I only played the first few stages (lack of time and skill :p) I'm sure that with previous versions it didn't crash, since I have available up until the fifth sage in the stage practice.

@Akaricchi
Copy link
Member

I don't have OpenBSD installed and I couldn't reproduce this on Linux so far. Is the crash consistent and does it always happen at the same part of the fight? Please test v1.4.0 too if possible, though I'm fairy sure it's probably affected as well.

@omar-polo
Copy link
Author

omar-polo commented Mar 21, 2024

It happens exactly at the same time every time. It's right at the start of the "Moonlight Rocket" spell (difficulty easy fwiw), so I managed to reproduce it also in the "spell practice". It doesn't seem to depend on the character used nor on the exact moves I do.

this is exactly where it crashes if it helps.

image

@omar-polo
Copy link
Author

omar-polo commented Mar 21, 2024

I forgot to include some data from the backtrace; here's the evt struct at the moment it crashes:

(gdb) f 10
#10 coevent_cancel (evt=0x7adcc44be38)
    at ../taisei-1.4.1/src/coroutine/coevent.c:130
130                     dynarray_free_data(&evt->subscribers);
(gdb) p *evt
$1 = {subscribers = {{data = 0x7adcc44bef0, num_elements = 0, capacity = 0},
    dyn_array = {data = 0x7adcc44bef0, num_elements = 0, capacity = 0}},
  unique_id = 1662684336, num_signaled = 1965}

(edit: the addresses are different from the OP since it's a different core file. It always crashes here though)

OpenBSD' malloc (Otto' malloc) tends to be quite strict and often unveils use-after-free, double-frees, writes out-of-bounds and other invalid usages that other allocators don't find. That said, it's not infallible and there is a chance that there's an invalid usage somewhere else and this free is only unveiling it somehow.

I'll try to bisect this and will update the issue with my findings. It could take a bit though.

@Akaricchi
Copy link
Member

Akaricchi commented Mar 21, 2024

Could you also try building with meson configure -Db_sanitize=address,undefined? It may give a more detailed error report and/or find an earlier fault.

@omar-polo
Copy link
Author

I learned a few things doing the bisect.

First, this issue only exists with LTO builds, with -Db_lto=false the current master and previous revisions works fine.

Secondly, it was introduced after 1.3 and before 1.4.0.

Lastly, assuming my git bisect was right (I started with good 1.3.0 and bad master), the first bad commit is 558541e which is quite big.

I can't use -Db_sanitize=address,undefined since it doesn't seem supported by clang on OpenBSD unfortunately:

c++: error: unsupported option '-fsanitize=address' for target 'amd64-unknown-openbsd7.5'

@Akaricchi
Copy link
Member

We've had some LTO-related miscompilations in windows builds in the past (though in library code, not taisei itself)… those resolved themselves with a toolchain update, IIRC. That said, it's hard to tell whether the compiler or the code is at fault, could very well be either. This kind of thing is always a giant pain in the ass to debug. What version of LLVM are you on? 17.0.6 works fine for me on Linux.

Lastly, assuming my git bisect was right (I started with good 1.3.0 and bad master), the first bad commit is 558541e which is quite big.

The bisect seems correct. If you were to bisect the squashed commits, it would probably lead you to the one that ported that specific spell to the coroutine system. The "problem" function is this since it's the only thing that hosts events on a coroutine stack there, which is where it goes bad according to the stack trace. I suspect that line 100 or line 119 might be the trigger. Try replacing them with something like WAIT(60); and see if it still crashes in the same way. This will definitely break the spell card though.

Another thing to try: compile with:

meson configure -Ddeveloper=true -Dc_args='-DCO_TASK_DEBUG -DEVT_DEBUG'

This will write a lot of debug output to the log, which may or may not help me make sense of what's going on.

@omar-polo
Copy link
Author

The default clang version on OpenBSD is 16.0.6, but I tried with clang 17.0.6 with the same outcome. Truth to be told, I'm not sure if clang-17 from packages is using ld.lld-17 or base' ld (which is clang lld 16) and I don't know how to check.

I tried to replace the WAIT_EVENT_OR_DIE at line 100 and 119 with a WAIT(60) and tried all the combination (only one, only the other, both) without success, it still segfaults the same way.

Here's the output of taisei running with the debug output (it's very big though, ~12M).

log.txt

@Akaricchi
Copy link
Member

Akaricchi commented Sep 29, 2024

Hi, sorry for the super late update.

I've tried to debug this in an OpenBSD VM a couple months ago but got absolutely nowhere. I managed to segfault gdb itself, and got some wonderful stack traces such as this one that I saved:

    frame #5: 0x000008b3c36e61a3 libc.so.99.0`_libc_free(ptr=0x000008b3c535aef0) at malloc.c:1747:2
  * frame #6: 0x000008b10a3419de taisei`coevent_cancel [inlined] libc_free(ptr=0x0000000000000000) at consideredharmful.h:82:36
    frame #7: 0x000008b10a3419d9 taisei`coevent_cancel [inlined] mem_free(ptr=0x0000000000000000) at memory.c:35:2

For reference, this is the definition of libc_free:

INLINE void libc_free(void *ptr) { free(ptr); }

So, yeah, I gave up. I might try it again later, but this one may just be beyond me. The only thing I'm pretty sure of now is that this is not just a memory management error; this is either a miscompilation or a very nasty case of UB.

In the meanwhile, can you please try building with the latest LLVM to see if maybe that magically fixes the problem?

@omar-polo
Copy link
Author

@Akaricchi thank you for trying! I've heard that we're close to update to llvm18 in base so once that lands I'll try and report back!

(p.s. gdb on OpenBSD is a bit old, it's one of the last non-GPLv3 versions. You may install the gdb package and use egdb, or use lldb which is in base nowadays).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants