Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation faults / memory corruption using zfs git with init_on_alloc=0 init_on_free=0 #16689

Closed
mtippmann opened this issue Oct 25, 2024 · 28 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@mtippmann
Copy link

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version rolling
Kernel Version 6.11.5-zen1-1-zen
Architecture amd64
OpenZFS Version 2.3.99.r34.g152ae5c9bc
~ » cat /proc/cmdline
zfs=zroot/arch rw mitigations=off init_on_alloc=0 init_on_free=0 lsm=landlock,lockdown,yama,integrity,apparmor,bpf pcie_aspm=performance systemd.gpt_auto=0 spl.spl_hostid=0x00bab10c
~ » cat /etc/modprobe.d/zfs.conf | grep -v \#
options zfs zfs_vdev_max_active=1024
options zfs zfs_txg_timeout=5
options zfs zfs_vdev_scrub_min_active=1
options zfs zfs_vdev_scrub_max_active=2
options zfs zfs_vdev_sync_write_min_active=1
options zfs zfs_vdev_sync_write_max_active=128
options zfs zfs_vdev_sync_read_min_active=1
options zfs zfs_vdev_sync_read_max_active=128
options zfs zfs_vdev_async_read_min_active=1
options zfs zfs_vdev_async_read_max_active=128
options zfs zfs_vdev_async_write_min_active=1
options zfs zfs_vdev_async_write_max_active=128
options zfs zfs_vdev_scheduler=none
options zfs zio_taskq_batch_pct=25
options zfs zfs_sync_taskq_batch_pct=25
options zfs zfs_prefetch_disable=1
options zfs zfs_arc_sys_free=2000000000
options zfs zvol_use_blk_mq=1
options zfs zfs_abd_scatter_enabled=0
options zfs compressed_arc_enabled=0
options zfs zfs_arc_shrinker_limit=0
options zfs zfs_bclone_enabled=0

Describe the problem you're observing

I'm seeing segmentation faults when using zfs git (zfs 2.2.6 is fine) with init_on_alloc=0 init_on_free=0 in cmdline - nothing in dmesg - I can trigger that using a docker compose up with a few containers rails, mysql - after that system crashes and most commands fail. Shortly after it first appears whole system is crashing including plasmashell and so on.

It's a system I need to work so I was going back to 2.2.6 where everything is fine and stable. Not using init_on_alloc=0 init_on_free=0 might help but i'm not 100% sure here. I'm not using zvols.

System passes a bios memory test just fine. Dell Latitude E5470 / i7-6820HQ

Describe how to reproduce the problem

Good question. Maybe it reproduces using the kmod options listed here and the cmdline - for me it's triggered by a docker compose up so it could be related to overlayfs. At least that's when I was noticing it.

I assume it's a problem related to my kmod config settings or the cmdline settings overwise it would have already been found. Noticed a similiar behavoir a few weeks ago and tried pinning it down but failed. So I'd thought i'd put that here.

Include any warning/errors/backtraces from the system logs

there is nothing in dmesg. Below some random journalctl logfile entries about crashes (it all looks pretty random)

Okt 25 15:41:16  systemd[1]: incus.service: Main process exited, code=dumped, status=11/SEGV
Okt 25 15:41:16  systemd[1]: [email protected]: Deactivated successfully.
Okt 25 15:41:16  systemd-coredump[98503]: [🡕] Process 98494 (incusd) of user 0 dumped core.
                                                           
                                                           Stack trace of thread 98494:
                                                           #0  0x000060139c936214 n/a (incusd + 0x579214)
                                                           #1  0x000060139c90af45 n/a (incusd + 0x54df45)
                                                           #2  0x000060139c8f9aea n/a (incusd + 0x53caea)
                                                           #3  0x000060139c8fa214 n/a (incusd + 0x53d214)
                                                           #4  0x000060139c8f71b6 n/a (incusd + 0x53a1b6)
                                                           #5  0x000060139c931551 n/a (incusd + 0x574551)
                                                           #6  0x000060139c8cc158 n/a (incusd + 0x50f158)
                                                           #7  0x000060139c8e68f3 n/a (incusd + 0x5298f3)
                                                           #8  0x000060139c8e6130 n/a (incusd + 0x529130)
                                                           #9  0x000060139c8e5bdc n/a (incusd + 0x528bdc)
                                                           #10 0x000060139c8e5b3b n/a (incusd + 0x528b3b)
                                                           #11 0x000060139c8d2e12 n/a (incusd + 0x515e12)
                                                           #12 0x000060139c8d2c85 n/a (incusd + 0x515c85)
                                                           #13 0x000060139c8d22b3 n/a (incusd + 0x5152b3)
                                                           #14 0x000060139c8cc785 n/a (incusd + 0x50f785)
                                                           #15 0x000060139c92bf6d n/a (incusd + 0x56ef6d)
                                                           #16 0x000060139c8cca45 n/a (incusd + 0x50fa45)
                                                           #17 0x000060139c8bffbe n/a (incusd + 0x502fbe)
                                                           #18 0x000060139c8bfa1d n/a (incusd + 0x502a1d)
                                                           #19 0x000060139c8fba09 n/a (incusd + 0x53ea09)
                                                           #20 0x000060139c937fe0 n/a (incusd + 0x57afe0)
                                                           #21 0x00007ad45137fecc __libc_start_main_impl (libc.so.6 + 0x25ecc)
                                                           #22 0x000060139c8bbdf5 n/a (incusd + 0x4fedf5)
                                                           ELF object binary architecture: AMD x86-64

Okt 25 15:41:17  systemd[1]: Starting Incus Container Hypervisor...
Okt 25 15:41:17  incusd[98550]: fatal error: arena already initialized
Okt 25 15:41:17  incusd[98550]: runtime stack:
Okt 25 15:41:17  incusd[98550]: runtime.throw({0x5642ef51278f?, 0x0?})
Okt 25 15:41:17  incusd[98550]:         /usr/lib/go/src/runtime/panic.go:1067 +0x4a fp=0x7fff025e56f0 sp=0x7fff025e56c0 pc=0x5642edef356a
Okt 25 15:41:17  incusd[98550]: runtime.(*mheap).sysAlloc(0x5642f0c409e0, 0x0?, 0x5642f0c50be8, 0x1)
Okt 25 15:41:17  incusd[98550]:         /usr/lib/go/src/runtime/malloc.go:768 +0x398 fp=0x7fff025e5790 sp=0x7fff025e56f0 pc=0x5642ede8e158
Okt 25 15:41:17  incusd[98550]: runtime.(*mheap).grow(0x5642f0c409e0, 0x0?)
Okt 25 15:41:20  systemd-coredump[98599]: [🡕] Process 98582 (containerd) of user 0 dumped core.
                                                           
                                                           Stack trace of thread 98582:
                                                           #0  0x0000000000da081d n/a (containerd + 0x9a081d)
                                                           #1  0x0000000000d72d25 runtime.args (containerd + 0x972d25)
                                                           #2  0x0000000000da9a85 runtime.args.abi0 (containerd + 0x9a9a85)
                                                           #3  0x0000000000da0f32 runtime.rt0_go.abi0 (containerd + 0x9a0f32)
                                                           #4  0x00007bb13ce23ecc __libc_start_main_impl (libc.so.6 + 0x25ecc)
                                                           #5  0x0000000000d20455 _start (containerd + 0x920455)
                                                           ELF object binary architecture: AMD x86-64
Okt 25 15:41:20  systemd[1]: containerd.service: Main process exited, code=dumped, status=11/SEGV
Okt 25 15:41:20  systemd[1]: containerd.service: Failed with result 'core-dump'.
@mtippmann mtippmann added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 25, 2024
@snajpa
Copy link
Contributor

snajpa commented Nov 7, 2024

can you try to run a debug build? (configure with --enable-debug) - I think this is the same problem we're seeing with @TheUbuntuGuy here vpsfreecz#1 - it seems like some kind of race when the memory is tight, probably dbuf_evict thread steps into something it's not supposed to... I have no idea honestly, to me this is pretty difficult to debug. Lots of moving parts in dbufs vs arc vs znode lifetime vs memory reclaim :(

would be great if you could try - if you see "Kernel panic - not syncing: buffer modified while frozen!" - then it's probably the same problem

FWIW it seems to be related to how ZFS works on 6.10 and newer kernels, older ones don't hit it, this bug is also already present in OpenZFS 2.2 stable release.

@snajpa
Copy link
Contributor

snajpa commented Nov 7, 2024

when you say 2.2.6 is fine, are you sure that is also with 6.11 kernel series?

@snajpa
Copy link
Contributor

snajpa commented Nov 7, 2024

If it ends up being the same issue, it's also worth noting that we've tried disabling block cloning, direct IO and tried to run only with sync=disabled, none of that has made any difference.

@mtippmann
Copy link
Author

@snajpa hi, thanks for your reply. I didn't have time yet to bisect this on another machine and I'm a little scared to test on my work-notebook at the moment before doing backups...

  1. I can try using debug - last time there was nothing in the dmesg
  2. It's been happening on kernel 6.11.5 or 6.11.4. I need to test with LTS 6.6 again to pin it down.
  3. For me it's happening with init_on_alloc=0 init_on_free=0 and maybe my custom module flags also play a role here like disabled compression in arc or adb_scatter set to 0/1. It's not too many options once I can reproduce it should be easier to pin down...

block cloning was not enabled and is unrelated I think. I think it happened before directio hit master.

2.3.0-rc2 also runs fine here. So I'm still not 100% sure if this a zfs issue or some general issue.

Lot's of words to say: No hard data yet, I need to reproduce it and bisect the commits. I have another machine for that but it can take a few days :/

@snajpa
Copy link
Contributor

snajpa commented Nov 11, 2024

I think it's somewhere in the impedance mismatch between new folios APIs and current ZFS code, I need to dig into it way deeper. It's the VFS which is now allocating pages, it seems to also be freeing them on other occasions than just migration, dunno. When I sugar the code with printks it won't reproduce :D so I'm stuck with going through crashdumps... originally I thought this has to be a bug with DMU, but now I think it'll be about a buf loaned to arc which gets freed by the kernel, or something on that note... tricky

@mtippmann
Copy link
Author

okay some progress in pinning it down - using arch and zfs git (zfs-kmod-2.3.99-68_g46c4f2ce0b) it doesn't happen on a different machine (i5-8500 CPU, HP Elitedesk 800 G4 SFF) with the same cmdline options and module options

it does happen on my dell notebook but only with init_on_alloc=0 init_on_free=0 without these it's fine.

So using init_on_alloc=0 on that dell machine here breaks something. Looks like it's maybe really some hardware fault here or some issue related to dell.

@snajpa
Copy link
Contributor

snajpa commented Nov 15, 2024

how big is the memory capacity + usage difference between the two machines? it only happens on memory reclaim is why I'm asking

@mtippmann mtippmann changed the title segmentation faults / memory corruption using zfs git 152ae5c9bc segmentation faults / memory corruption using zfs git 152ae5c9bc [ probaly false unrelated to zfs ] Nov 22, 2024
@mtippmann
Copy link
Author

mtippmann commented Nov 22, 2024

how big is the memory capacity + usage difference between the two machines? it only happens on memory reclaim is why I'm asking

notebook has 32gb memory, the elitedesk 48gb memory, both run some incus containers, docker but no hard 100% usage. however now it's working fine even with init_on_alloc=0 init_on_free=0 using latest git and regardless of any specific module options.

I'm closing this as false-positive - I don't have the capacity to debug this in detail at the moment and it looks like it was specific to that machine and never happened elsewhere.

@mtippmann mtippmann changed the title segmentation faults / memory corruption using zfs git 152ae5c9bc [ probaly false unrelated to zfs ] segmentation faults / memory corruption using zfs git 152ae5c9bc [ probaly unrelated to zfs ] Nov 22, 2024
@mtippmann mtippmann changed the title segmentation faults / memory corruption using zfs git 152ae5c9bc [ probaly unrelated to zfs ] segmentation faults / memory corruption using zfs git Nov 22, 2024
@mtippmann
Copy link
Author

here we go (with debug enabled I think)

ov 23 00:21:14 kleinerhellraiser kernel: ------------[ cut here ]------------
Nov 23 00:21:14 kleinerhellraiser kernel: WARNING: CPU: 1 PID: 8386 at mm/gup.c:144 try_grab_folio+0x77/0xc0
Nov 23 00:21:14 kleinerhellraiser kernel: Modules linked in: xt_nat xt_tcpudp snd_seq_dummy snd_hrtimer rfcomm snd_seq snd_seq_device veth nft_masq xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netli>
Nov 23 00:21:14 kleinerhellraiser kernel:  snd_intel_sdw_acpi bluetooth mc iwlmvm crct10dif_pclmul dell_pc snd_hda_codec tcp_bbr polyval_clmulni platform_profile polyval_generic sch_fq crc16 snd_ctl_led ghash_clmulni_intel ee1004>
Nov 23 00:21:14 kleinerhellraiser kernel:  acpi_thermal_rel dell_smo8800 int340x_thermal_zone mac_hid usbip_host usbip_core pkcs8_key_parser i2c_dev sg crypto_user loop dm_mod nfnetlink ip_tables x_tables crc32_pclmul crc32c_inte>
Nov 23 00:21:14 kleinerhellraiser kernel: CPU: 1 UID: 999 PID: 8386 Comm: mysqld Tainted: P     U     OE      6.11.9-zen1-1-zen #1 1400000003000000474e550014a35950ebcafa42
Nov 23 00:21:14 kleinerhellraiser kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Nov 23 00:21:14 kleinerhellraiser kernel: Hardware name: Dell Inc. Latitude E5470/06DNG5, BIOS 1.34.3 11/20/2022
Nov 23 00:21:14 kleinerhellraiser kernel: RIP: 0010:try_grab_folio+0x77/0xc0
Nov 23 00:21:14 kleinerhellraiser kernel: Code: 8b 00 48 63 d6 be 23 00 00 00 48 c1 e8 36 48 8b 3c c5 e0 fd 87 b8 e8 f8 12 fe ff 31 c9 89 c8 c3 cc cc cc cc f0 01 70 34 eb f1 <0f> 0b b9 f4 ff ff ff 89 c8 c3 cc cc cc cc 48 8b 0f 48>
Nov 23 00:21:14 kleinerhellraiser kernel: RSP: 0018:ffffac3615c87888 EFLAGS: 00010282
Nov 23 00:21:14 kleinerhellraiser kernel: RAX: ffffda09a1438400 RBX: 0000000000210002 RCX: 00000000ffffff01
Nov 23 00:21:14 kleinerhellraiser kernel: RDX: 0000000000210002 RSI: 0000000000000001 RDI: ffffda09a1438400
Nov 23 00:21:14 kleinerhellraiser kernel: RBP: ffff9784b98280b8 R08: ffffda09a1438400 R09: ffff9784accfa108
Nov 23 00:21:14 kleinerhellraiser kernel: R10: ffff9781bbdaef0c R11: 000075d5c4000000 R12: ffff97864ee63d80
Nov 23 00:21:14 kleinerhellraiser kernel: R13: 000075d5c43b0000 R14: ffffda09a1438400 R15: 8000000850e10225
Nov 23 00:21:14 kleinerhellraiser kernel: FS:  000075d5cab63640(0000) GS:ffff978851a80000(0000) knlGS:0000000000000000
Nov 23 00:21:14 kleinerhellraiser kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 00:21:14 kleinerhellraiser kernel: CR2: 000000000373e8a0 CR3: 0000000655e46002 CR4: 00000000003706f0
Nov 23 00:21:14 kleinerhellraiser kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 23 00:21:14 kleinerhellraiser kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 23 00:21:14 kleinerhellraiser kernel: Call Trace:
Nov 23 00:21:14 kleinerhellraiser kernel:  <TASK>
Nov 23 00:21:14 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:21:14 kleinerhellraiser kernel:  ? __warn.cold+0x8e/0xf5
Nov 23 00:21:14 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:21:14 kleinerhellraiser kernel:  ? report_bug+0xe7/0x210
Nov 23 00:21:14 kleinerhellraiser kernel:  ? handle_bug+0x58/0x90
Nov 23 00:21:14 kleinerhellraiser kernel:  ? exc_invalid_op+0x19/0xc0
Nov 23 00:21:14 kleinerhellraiser kernel:  ? asm_exc_invalid_op+0x1a/0x20
Nov 23 00:21:14 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:21:14 kleinerhellraiser kernel:  follow_page_pte+0x127/0x720
Nov 23 00:21:14 kleinerhellraiser kernel:  follow_page_mask+0x332/0xe30
Nov 23 00:21:14 kleinerhellraiser kernel:  __get_user_pages+0x141/0x8c0
Nov 23 00:21:14 kleinerhellraiser kernel:  __gup_longterm_locked+0xb3/0x9f0
Nov 23 00:21:14 kleinerhellraiser kernel:  ? gup_fast+0x8b/0x1c0
Nov 23 00:21:14 kleinerhellraiser kernel:  get_user_pages_fast+0x137/0x190
Nov 23 00:21:14 kleinerhellraiser kernel:  __iov_iter_get_pages_alloc+0x298/0x670
Nov 23 00:21:14 kleinerhellraiser kernel:  ? spl_kmem_alloc_impl+0x9b/0x170 [spl 1400000003000000474e5500f0ded18920adeb2c]
Nov 23 00:21:14 kleinerhellraiser kernel:  iov_iter_get_pages2+0x1d/0x40
Nov 23 00:21:14 kleinerhellraiser kernel:  zfs_uio_get_dio_pages_alloc+0xe8/0x6f0 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:21:14 kleinerhellraiser kernel:  zfs_setup_direct+0xb4/0x140 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:21:14 kleinerhellraiser kernel:  zfs_write+0x255/0xd20 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:21:14 kleinerhellraiser kernel:  ? __mod_memcg_lruvec_state+0xa0/0x150
Nov 23 00:21:14 kleinerhellraiser kernel:  ? __lruvec_stat_mod_folio+0x83/0xd0
Nov 23 00:21:14 kleinerhellraiser kernel:  ? folio_add_file_rmap_ptes+0x3b/0xb0
Nov 23 00:21:14 kleinerhellraiser kernel:  zpl_iter_write+0x129/0x1b0 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:21:14 kleinerhellraiser kernel:  vfs_write+0x366/0x4a0
Nov 23 00:21:14 kleinerhellraiser kernel:  __x64_sys_pwrite64+0x98/0xd0
Nov 23 00:21:14 kleinerhellraiser kernel:  do_syscall_64+0x82/0x190
Nov 23 00:21:14 kleinerhellraiser kernel:  ? switch_fpu_return+0x4e/0xd0
Nov 23 00:21:14 kleinerhellraiser kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 23 00:21:14 kleinerhellraiser kernel: RIP: 0033:0x75d5cb9f4bcf
Nov 23 00:21:14 kleinerhellraiser kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 9d a4 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 ed a4>
Nov 23 00:21:14 kleinerhellraiser kernel: RSP: 002b:000075d5cab5e810 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Nov 23 00:21:14 kleinerhellraiser kernel: RAX: ffffffffffffffda RBX: 000075d5c43b0000 RCX: 000075d5cb9f4bcf
Nov 23 00:21:14 kleinerhellraiser kernel: RDX: 0000000000100000 RSI: 000075d5c43b0000 RDI: 0000000000000003
Nov 23 00:21:14 kleinerhellraiser kernel: RBP: 000075d5cab5eaf0 R08: 0000000000000000 R09: 000075d5cab5eb2c
Nov 23 00:21:14 kleinerhellraiser kernel: R10: 0000000000100000 R11: 0000000000000293 R12: 0000000000100000
Nov 23 00:21:14 kleinerhellraiser kernel: R13: 0000000000100000 R14: 00000000376227b0 R15: 0000000000100000
Nov 23 00:21:14 kleinerhellraiser kernel:  </TASK>
Nov 23 00:21:14 kleinerhellraiser kernel: ---[ end trace 0000000000000000 ]---

@mtippmann mtippmann reopened this Nov 22, 2024
@mtippmann mtippmann changed the title segmentation faults / memory corruption using zfs git segmentation faults / memory corruption using zfs git with zfs_abd_scatter_enabled=0 Nov 22, 2024
@mtippmann
Copy link
Author

mtippmann commented Nov 22, 2024

it happens with:

zfs_abd_scatter_enabled=0

I can reliable trigger it by doing a docker compose up for a project here.

Nov 23 00:34:26 kleinerhellraiser kernel: ------------[ cut here ]------------
Nov 23 00:34:26 kleinerhellraiser kernel: WARNING: CPU: 3 PID: 17636 at mm/gup.c:144 try_grab_folio+0x77/0xc0
Nov 23 00:34:26 kleinerhellraiser kernel: Modules linked in: xt_nat xt_tcpudp veth nft_masq xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype n>
Nov 23 00:34:26 kleinerhellraiser kernel:  platform_profile ac97_bus snd_pcm_dmaengine vboxnetflt(OE) crct10dif_pclmul vboxnetadp(OE) polyval_clmulni dell_laptop snd_hda_intel polyval_generic libarc4 ghash_clmulni_intel snd_intel_dspcfg dell_wmi vboxdr>
Nov 23 00:34:26 kleinerhellraiser kernel:  int340x_thermal_zone pmt_class dell_smo8800 parport mac_hid usbip_host usbip_core pkcs8_key_parser i2c_dev sg crypto_user dm_mod loop nfnetlink ip_tables x_tables crc32_pclmul crc32c_intel sha512_ssse3 sha256_>
Nov 23 00:34:26 kleinerhellraiser kernel: CPU: 3 UID: 999 PID: 17636 Comm: mysqld Tainted: P     U     OE      6.11.9-zen1-1-zen #1 1400000003000000474e550014a35950ebcafa42
Nov 23 00:34:26 kleinerhellraiser kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Nov 23 00:34:26 kleinerhellraiser kernel: Hardware name: Dell Inc. Latitude E5470/06DNG5, BIOS 1.34.3 11/20/2022
Nov 23 00:34:26 kleinerhellraiser kernel: RIP: 0010:try_grab_folio+0x77/0xc0
Nov 23 00:34:26 kleinerhellraiser kernel: Code: 8b 00 48 63 d6 be 23 00 00 00 48 c1 e8 36 48 8b 3c c5 e0 fd c7 b0 e8 f8 12 fe ff 31 c9 89 c8 c3 cc cc cc cc f0 01 70 34 eb f1 <0f> 0b b9 f4 ff ff ff 89 c8 c3 cc cc cc cc 48 8b 0f 48 c1 e9 33 83
Nov 23 00:34:26 kleinerhellraiser kernel: RSP: 0018:ffffb8b1819b7768 EFLAGS: 00010282
Nov 23 00:34:26 kleinerhellraiser kernel: RAX: ffffe70f21438400 RBX: 0000000000210002 RCX: 00000000ffffff01
Nov 23 00:34:26 kleinerhellraiser kernel: RDX: 0000000000210002 RSI: 0000000000000001 RDI: ffffe70f21438400
Nov 23 00:34:26 kleinerhellraiser kernel: RBP: ffff988978353678 R08: ffffe70f21438400 R09: ffff9888ba4cce08
Nov 23 00:34:26 kleinerhellraiser kernel: R10: ffff9884c73dbd0c R11: 000072c938000000 R12: ffff988979151d80
Nov 23 00:34:26 kleinerhellraiser kernel: R13: 000072c9383b0000 R14: ffffe70f21438400 R15: 8000000850e10225
Nov 23 00:34:26 kleinerhellraiser kernel: FS:  000072c93ffb5640(0000) GS:ffff988b91b80000(0000) knlGS:0000000000000000
Nov 23 00:34:26 kleinerhellraiser kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 00:34:26 kleinerhellraiser kernel: CR2: 0000000001bce910 CR3: 00000006420b6002 CR4: 00000000003706f0
Nov 23 00:34:26 kleinerhellraiser kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 23 00:34:26 kleinerhellraiser kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 23 00:34:26 kleinerhellraiser kernel: Call Trace:
Nov 23 00:34:26 kleinerhellraiser kernel:  <TASK>
Nov 23 00:34:26 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:34:26 kleinerhellraiser kernel:  ? __warn.cold+0x8e/0xf5
Nov 23 00:34:26 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:34:26 kleinerhellraiser kernel:  ? report_bug+0xe7/0x210
Nov 23 00:34:26 kleinerhellraiser kernel:  ? handle_bug+0x58/0x90
Nov 23 00:34:26 kleinerhellraiser kernel:  ? exc_invalid_op+0x19/0xc0
Nov 23 00:34:26 kleinerhellraiser kernel:  ? asm_exc_invalid_op+0x1a/0x20
Nov 23 00:34:26 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:34:26 kleinerhellraiser kernel:  follow_page_pte+0x127/0x720
Nov 23 00:34:26 kleinerhellraiser kernel:  follow_page_mask+0x332/0xe30
Nov 23 00:34:26 kleinerhellraiser kernel:  __get_user_pages+0x141/0x8c0
Nov 23 00:34:26 kleinerhellraiser kernel:  __gup_longterm_locked+0xb3/0x9f0
Nov 23 00:34:26 kleinerhellraiser kernel:  ? gup_fast+0x8b/0x1c0
Nov 23 00:34:26 kleinerhellraiser kernel:  get_user_pages_fast+0x137/0x190
Nov 23 00:34:26 kleinerhellraiser kernel:  __iov_iter_get_pages_alloc+0x298/0x670
Nov 23 00:34:26 kleinerhellraiser kernel:  ? spl_kmem_alloc_impl+0x9b/0x170 [spl 1400000003000000474e5500f0ded18920adeb2c]
Nov 23 00:34:26 kleinerhellraiser kernel:  iov_iter_get_pages2+0x1d/0x40
Nov 23 00:34:26 kleinerhellraiser kernel:  zfs_uio_get_dio_pages_alloc+0xe8/0x6f0 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:34:26 kleinerhellraiser kernel:  zfs_setup_direct+0xb4/0x140 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:34:26 kleinerhellraiser kernel:  zfs_write+0x255/0xd20 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:34:26 kleinerhellraiser kernel:  ? free_unref_page+0x29a/0x320
Nov 23 00:34:26 kleinerhellraiser kernel:  zpl_iter_write+0x129/0x1b0 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:34:26 kleinerhellraiser kernel:  vfs_write+0x366/0x4a0
Nov 23 00:34:26 kleinerhellraiser kernel:  __x64_sys_pwrite64+0x98/0xd0
Nov 23 00:34:26 kleinerhellraiser kernel:  do_syscall_64+0x82/0x190
Nov 23 00:34:26 kleinerhellraiser kernel:  ? __x64_sys_pwrite64+0xa8/0xd0
Nov 23 00:34:26 kleinerhellraiser kernel:  ? syscall_exit_to_user_mode+0x10/0x1f0
Nov 23 00:34:26 kleinerhellraiser kernel:  ? do_syscall_64+0x8e/0x190
Nov 23 00:34:26 kleinerhellraiser kernel:  ? do_user_addr_fault+0x3bc/0x860
Nov 23 00:34:26 kleinerhellraiser kernel:  ? exc_page_fault+0x81/0x190
Nov 23 00:34:26 kleinerhellraiser kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 23 00:34:26 kleinerhellraiser kernel: RIP: 0033:0x72c940e46bcf
Nov 23 00:34:26 kleinerhellraiser kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 9d a4 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 ed a4 f8 ff 48 8b
Nov 23 00:34:26 kleinerhellraiser kernel: RSP: 002b:000072c93ffb0810 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Nov 23 00:34:26 kleinerhellraiser kernel: RAX: ffffffffffffffda RBX: 000072c9383b0000 RCX: 000072c940e46bcf
Nov 23 00:34:26 kleinerhellraiser kernel: RDX: 0000000000100000 RSI: 000072c9383b0000 RDI: 0000000000000003
Nov 23 00:34:26 kleinerhellraiser kernel: RBP: 000072c93ffb0af0 R08: 0000000000000000 R09: 000072c93ffb0b2c
Nov 23 00:34:26 kleinerhellraiser kernel: R10: 0000000000100000 R11: 0000000000000293 R12: 0000000000100000
Nov 23 00:34:26 kleinerhellraiser kernel: R13: 0000000000100000 R14: 000000003ee897b0 R15: 0000000000100000
Nov 23 00:34:26 kleinerhellraiser kernel:  </TASK>
Nov 23 00:34:26 kleinerhellraiser kernel: ---[ end trace 0000000000000000 ]---
Nov 23 00:34:26 kleinerhellraiser kernel: show_signal: 2 callbacks suppressed
Nov 23 00:34:26 kleinerhellraiser kernel: traps: sed[17664] general protection fault ip:7ef597ce1e4a sp:7ffcd0040790 error:0 in libc.so.6[98e4a,7ef597c6f000+155000]
Nov 23 00:34:26 kleinerhellraiser systemd-coredump[17665]: Process 17664 (sed) of user 0 terminated abnormally with signal 11/SEGV, processing...

@mtippmann mtippmann changed the title segmentation faults / memory corruption using zfs git with zfs_abd_scatter_enabled=0 segmentation faults / memory corruption using zfs git with init_on_alloc=0 init_on_free=0 Nov 22, 2024
@mtippmann
Copy link
Author

it's unrelated to zfs_abd_scatter_enabled and happens with zfs_abd_scatter_enabled=1 too - what triggers it is: init_on_alloc=0 init_on_free=0 here.

zfs-2.3.99-90_g38c0324c0f
zfs-kmod-2.3.99-90_g38c0324c0f

@mtippmann
Copy link
Author

here is the one with abd_scatter=1

Nov 23 00:42:53 kleinerhellraiser kernel: ------------[ cut here ]------------
Nov 23 00:42:53 kleinerhellraiser kernel: WARNING: CPU: 3 PID: 18171 at mm/gup.c:144 try_grab_folio+0x77/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel: Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device nft_masq xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netli>
Nov 23 00:42:53 kleinerhellraiser kernel:  snd_hda_intel platform_profile sch_fq snd_ctl_led kvm snd_intel_dspcfg vboxnetflt(OE) mac80211 crct10dif_pclmul snd_intel_sdw_acpi vboxnetadp(OE) polyval_clmulni dell_laptop snd_hda_code>
Nov 23 00:42:53 kleinerhellraiser kernel:  acpi_pad acpi_thermal_rel pmt_class rfkill mac_hid usbip_host usbip_core pkcs8_key_parser i2c_dev sg crypto_user dm_mod loop nfnetlink ip_tables x_tables crc32_pclmul crc32c_intel sha512>
Nov 23 00:42:53 kleinerhellraiser kernel: CPU: 3 UID: 999 PID: 18171 Comm: mysqld Tainted: P     U     OE      6.11.9-zen1-1-zen #1 1400000003000000474e550014a35950ebcafa42
Nov 23 00:42:53 kleinerhellraiser kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Nov 23 00:42:53 kleinerhellraiser kernel: Hardware name: Dell Inc. Latitude E5470/06DNG5, BIOS 1.34.3 11/20/2022
Nov 23 00:42:53 kleinerhellraiser kernel: RIP: 0010:try_grab_folio+0x77/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel: Code: 8b 00 48 63 d6 be 23 00 00 00 48 c1 e8 36 48 8b 3c c5 e0 fd 47 88 e8 f8 12 fe ff 31 c9 89 c8 c3 cc cc cc cc f0 01 70 34 eb f1 <0f> 0b b9 f4 ff ff ff 89 c8 c3 cc cc cc cc 48 8b 0f 48>
Nov 23 00:42:53 kleinerhellraiser dockerd[2032]: time="2024-11-23T00:42:53.643102052+01:00" level=debug msg="sandbox set key processing took 31.976135ms for container 2a441bae8ebc63e803e57c87db0db94ac8143bdb53fcdc8488aba5314a92e6>
Nov 23 00:42:53 kleinerhellraiser dockerd[2032]: time="2024-11-23T00:42:53.690976942+01:00" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/create
Nov 23 00:42:53 kleinerhellraiser dockerd[2032]: time="2024-11-23T00:42:53.698142132+01:00" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/start
Nov 23 00:42:53 kleinerhellraiser dockerd[2032]: time="2024-11-23T00:42:53.702344130+01:00" level=debug msg="Calling GET /v1.47/containers/2a441bae8ebc63e803e57c87db0db94ac8143bdb53fcdc8488aba5314a92e627/json" spanID=2b6173323991>
Nov 23 00:42:53 kleinerhellraiser dockerd[2032]: time="2024-11-23T00:42:53.729458511+01:00" level=debug msg="Name To resolve: web." spanID=5c0ec28d6262a213 traceID=5d58b516222720794a6ef4f0afd046f6
Nov 23 00:42:53 kleinerhellraiser kernel: CPU: 3 UID: 999 PID: 18171 Comm: mysqld Tainted: P     U     OE      6.11.9-zen1-1-zen #1 1400000003000000474e550014a35950ebcafa42
Nov 23 00:42:53 kleinerhellraiser kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Nov 23 00:42:53 kleinerhellraiser kernel: Hardware name: Dell Inc. Latitude E5470/06DNG5, BIOS 1.34.3 11/20/2022
Nov 23 00:42:53 kleinerhellraiser kernel: RIP: 0010:try_grab_folio+0x77/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel: Code: 8b 00 48 63 d6 be 23 00 00 00 48 c1 e8 36 48 8b 3c c5 e0 fd 47 88 e8 f8 12 fe ff 31 c9 89 c8 c3 cc cc cc cc f0 01 70 34 eb f1 <0f> 0b b9 f4 ff ff ff 89 c8 c3 cc cc cc cc 48 8b 0f 48 c1 e9 33 83
Nov 23 00:42:53 kleinerhellraiser kernel: RSP: 0018:ffffb5f74d7d76c8 EFLAGS: 00010282
Nov 23 00:42:53 kleinerhellraiser kernel: RAX: ffffe8e861438400 RBX: 0000000000210002 RCX: 00000000ffffff01
Nov 23 00:42:53 kleinerhellraiser kernel: RDX: 0000000000210002 RSI: 0000000000000001 RDI: ffffe8e861438400
Nov 23 00:42:53 kleinerhellraiser kernel: RBP: ffff97be51013000 R08: ffffe8e861438400 R09: ffff97be45c8d808
Nov 23 00:42:53 kleinerhellraiser kernel: R10: ffff97ba3544c00c R11: 00007556a0000000 R12: ffff97be505f1d80
Nov 23 00:42:53 kleinerhellraiser kernel: R13: 00007556a03b0000 R14: ffffe8e861438400 R15: 8000000850e10225
Nov 23 00:42:53 kleinerhellraiser kernel: FS:  00007556a888c640(0000) GS:ffff97c0d1b80000(0000) knlGS:0000000000000000
Nov 23 00:42:53 kleinerhellraiser kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 00:42:53 kleinerhellraiser kernel: CR2: 0000000001bce910 CR3: 00000005d12aa006 CR4: 00000000003706f0
Nov 23 00:42:53 kleinerhellraiser kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 23 00:42:53 kleinerhellraiser kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 23 00:42:53 kleinerhellraiser kernel: Call Trace:
Nov 23 00:42:53 kleinerhellraiser kernel:  <TASK>
Nov 23 00:42:53 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? __warn.cold+0x8e/0xf5
Nov 23 00:42:53 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? report_bug+0xe7/0x210
Nov 23 00:42:53 kleinerhellraiser kernel:  ? handle_bug+0x58/0x90
Nov 23 00:42:53 kleinerhellraiser kernel:  ? exc_invalid_op+0x19/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? asm_exc_invalid_op+0x1a/0x20
Nov 23 00:42:53 kleinerhellraiser kernel:  ? try_grab_folio+0x77/0xc0
Nov 23 00:42:53 kleinerhellraiser kernel:  follow_page_pte+0x127/0x720
Nov 23 00:42:53 kleinerhellraiser kernel:  follow_page_mask+0x332/0xe30
Nov 23 00:42:53 kleinerhellraiser kernel:  __get_user_pages+0x141/0x8c0
Nov 23 00:42:53 kleinerhellraiser kernel:  __gup_longterm_locked+0xb3/0x9f0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? gup_fast+0x8b/0x1c0
Nov 23 00:42:53 kleinerhellraiser kernel:  get_user_pages_fast+0x137/0x190
Nov 23 00:42:53 kleinerhellraiser kernel:  __iov_iter_get_pages_alloc+0x298/0x670
Nov 23 00:42:53 kleinerhellraiser kernel:  ? spl_kmem_alloc_impl+0x9b/0x170 [spl 1400000003000000474e5500f0ded18920adeb2c]
Nov 23 00:42:53 kleinerhellraiser kernel:  iov_iter_get_pages2+0x1d/0x40
Nov 23 00:42:53 kleinerhellraiser kernel:  zfs_uio_get_dio_pages_alloc+0xe8/0x6f0 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:42:53 kleinerhellraiser kernel:  zfs_setup_direct+0xb4/0x140 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:42:53 kleinerhellraiser kernel:  zfs_write+0x255/0xd20 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:42:53 kleinerhellraiser kernel:  ? free_unref_page+0x29a/0x320
Nov 23 00:42:53 kleinerhellraiser kernel:  ? rrw_exit+0x68/0x160 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:42:53 kleinerhellraiser kernel:  zpl_iter_write+0x129/0x1b0 [zfs 1400000003000000474e55003fe13968877c33de]
Nov 23 00:42:53 kleinerhellraiser kernel:  vfs_write+0x366/0x4a0
Nov 23 00:42:53 kleinerhellraiser kernel:  __x64_sys_pwrite64+0x98/0xd0
Nov 23 00:42:53 kleinerhellraiser kernel:  do_syscall_64+0x82/0x190
Nov 23 00:42:53 kleinerhellraiser kernel:  ? __x64_sys_pwrite64+0xa8/0xd0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? syscall_exit_to_user_mode+0x10/0x1f0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? do_syscall_64+0x8e/0x190
Nov 23 00:42:53 kleinerhellraiser kernel:  ? handle_mm_fault+0x58a/0x1500
Nov 23 00:42:53 kleinerhellraiser kernel:  ? mt_find+0x1f0/0x4d0
Nov 23 00:42:53 kleinerhellraiser kernel:  ? do_user_addr_fault+0x3bc/0x860
Nov 23 00:42:53 kleinerhellraiser kernel:  ? exc_page_fault+0x81/0x190
Nov 23 00:42:53 kleinerhellraiser kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 23 00:42:53 kleinerhellraiser kernel: RIP: 0033:0x7556a971dbcf
Nov 23 00:42:53 kleinerhellraiser kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 9d a4 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 ed a4 f8 ff 48 8b
Nov 23 00:42:53 kleinerhellraiser kernel: RSP: 002b:00007556a8887810 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Nov 23 00:42:53 kleinerhellraiser kernel: RAX: ffffffffffffffda RBX: 00007556a03b0000 RCX: 00007556a971dbcf
Nov 23 00:42:53 kleinerhellraiser kernel: RDX: 0000000000100000 RSI: 00007556a03b0000 RDI: 0000000000000003
Nov 23 00:42:53 kleinerhellraiser kernel: RBP: 00007556a8887af0 R08: 0000000000000000 R09: 00007556a8887b2c
Nov 23 00:42:53 kleinerhellraiser kernel: R10: 0000000000100000 R11: 0000000000000293 R12: 0000000000100000
Nov 23 00:42:53 kleinerhellraiser kernel: R13: 0000000000100000 R14: 00000000397c27b0 R15: 0000000000100000
Nov 23 00:42:53 kleinerhellraiser kernel:  </TASK>
Nov 23 00:42:53 kleinerhellraiser kernel: ---[ end trace 0000000000000000 ]---

hope this is somehow useful.

@mtippmann
Copy link
Author

this looks like it's similiar to #16642 - mysql (in docker) is also involved (and fails in a similiar way)

#16594 looks like it could also be related.

@snajpa
Copy link
Contributor

snajpa commented Nov 23, 2024

in that case that's probably already fixed in master, can you verify?

@mtippmann
Copy link
Author

in that case that's probably already fixed in master, can you verify?

I'm running current master (git as of today)

zfs-2.3.99-90_g38c0324c0f
zfs-kmod-2.3.99-90_g38c0324c0f

all the backtraces above are also using that zfs version.

@snajpa
Copy link
Contributor

snajpa commented Nov 23, 2024

can you please share the docker-compose.yml so that I can reproduce it locally?

@snajpa
Copy link
Contributor

snajpa commented Nov 23, 2024

while we're at it, can you please retry also with most current 6.11 kernel, 6.11.10? I've been hitting some weird stuff with older than 6.11.6...

@mtippmann
Copy link
Author

docker-compose: I've managed to trigger it using this docker-compose - software is from the company so I can't share it but looks like mysql is causing this - without environment variables where it bails out it's fine with supplied enironment it starts up and then the segfaults happen - for completness I've kept the other containers:

services:
  db:
    restart: unless-stopped
    image: mysql:8
    volumes:
      - mysql:/var/lib/mysql
    environment:
      - MYSQL_DATABASE=backend
      - MYSQL_ROOT_PASSWORD=foo

    networks:
      - overlay
    ports:
      - '127.0.0.1:3308:3306'

  redis:
    restart: unless-stopped
    image: redis:7-alpine
    healthcheck:
      test: ['CMD', 'redis-cli', 'ping']
    volumes:
      - redis:/data
    networks:
      - overlay
    command: redis-server --appendonly yes

  mailhog:
    image: mailhog/mailhog
    logging:
      driver: 'none'  # disable saving logs
    ports:
      - 1025:1025 # smtp server
      - 8025:8025 # web ui
    networks:
      - overlay

networks:
  overlay:

volumes:
  mysql:
  redis:

running docker compose up and waiting a while results in crashes and segfaults, after that anything on the commandline results in segmentation fault even a simple dmesg - depending on what's going.

kernel is 6.11.9-zen1-1-zen - I can try using testing with a newer kernel later

it's hitting again WARNING: CPU: 2 PID: 6555 at mm/gup.c:144 try_grab_folio+0x77/0xc0

some more data for debugging:

$ zpool get all
NAME   PROPERTY                       VALUE                          SOURCE
zroot  size                           1.81T                          -
zroot  capacity                       72%                            -
zroot  altroot                        -                              default
zroot  health                         ONLINE                         -
zroot  guid                           5896257792148590087            -
zroot  version                        -                              default
zroot  bootfs                         zroot/arch                     local
zroot  delegation                     on                             default
zroot  autoreplace                    off                            default
zroot  cachefile                      -                              default
zroot  failmode                       wait                           default
zroot  listsnapshots                  off                            default
zroot  autoexpand                     off                            default
zroot  dedupratio                     1.00x                          -
zroot  free                           517G                           -
zroot  allocated                      1.31T                          -
zroot  readonly                       off                            -
zroot  ashift                         13                             local
zroot  comment                        -                              default
zroot  expandsize                     -                              -
zroot  freeing                        0                              -
zroot  fragmentation                  30%                            -
zroot  leaked                         0                              -
zroot  multihost                      off                            default
zroot  checkpoint                     -                              -
zroot  load_guid                      16998629611148276359           -
zroot  autotrim                       off                            default
zroot  compatibility                  openzfs-2.1-linux              local
zroot  bcloneused                     0                              -
zroot  bclonesaved                    0                              -
zroot  bcloneratio                    1.00x                          -
zroot  dedup_table_size               0                              -
zroot  dedup_table_quota              auto                           default
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           enabled                        local
zroot  feature@large_dnode            active                         local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  enabled                        local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             enabled                        local
zroot  feature@project_quota          active                         local
zroot  feature@device_removal         enabled                        local
zroot  feature@obsolete_counts        enabled                        local
zroot  feature@zpool_checkpoint       enabled                        local
zroot  feature@spacemap_v2            active                         local
zroot  feature@allocation_classes     enabled                        local
zroot  feature@resilver_defer         enabled                        local
zroot  feature@bookmark_v2            enabled                        local
zroot  feature@redaction_bookmarks    enabled                        local
zroot  feature@redacted_datasets      enabled                        local
zroot  feature@bookmark_written       enabled                        local
zroot  feature@log_spacemap           active                         local
zroot  feature@livelist               enabled                        local
zroot  feature@device_rebuild         enabled                        local
zroot  feature@zstd_compress          enabled                        local
zroot  feature@draid                  enabled                        local
zroot  feature@zilsaxattr             disabled                       local
zroot  feature@head_errlog            disabled                       local
zroot  feature@blake3                 disabled                       local
zroot  feature@block_cloning          disabled                       local
zroot  feature@vdev_zaps_v2           disabled                       local
zroot  feature@redaction_list_spill   disabled                       local
zroot  feature@raidz_expansion        disabled                       local
zroot  feature@fast_dedup             disabled                       local
zroot  feature@longname               disabled                       local
zroot  feature@large_microzap         disabled                       local
$ zfs get all zroot/docker
NAME          PROPERTY              VALUE                    SOURCE
zroot/docker  type                  filesystem               -
zroot/docker  creation              Do Nov 16 11:18 2023     -
zroot/docker  used                  52.3G                    -
zroot/docker  available             459G                     -
zroot/docker  referenced            52.3G                    -
zroot/docker  compressratio         1.66x                    -
zroot/docker  mounted               yes                      -
zroot/docker  quota                 none                     default
zroot/docker  reservation           none                     default
zroot/docker  recordsize            128K                     default
zroot/docker  mountpoint            /var/lib/docker          local
zroot/docker  sharenfs              off                      default
zroot/docker  checksum              on                       default
zroot/docker  compression           on                       default
zroot/docker  atime                 on                       default
zroot/docker  devices               on                       default
zroot/docker  exec                  on                       default
zroot/docker  setuid                on                       default
zroot/docker  readonly              off                      default
zroot/docker  zoned                 off                      default
zroot/docker  snapdir               hidden                   default
zroot/docker  aclmode               discard                  default
zroot/docker  aclinherit            restricted               default
zroot/docker  createtxg             25616                    -
zroot/docker  canmount              on                       local
zroot/docker  xattr                 on                       inherited from zroot
zroot/docker  copies                1                        default
zroot/docker  version               5                        -
zroot/docker  utf8only              off                      -
zroot/docker  normalization         none                     -
zroot/docker  casesensitivity       sensitive                -
zroot/docker  vscan                 off                      default
zroot/docker  nbmand                off                      default
zroot/docker  sharesmb              off                      default
zroot/docker  refquota              none                     default
zroot/docker  refreservation        none                     default
zroot/docker  guid                  10715424641590777763     -
zroot/docker  primarycache          all                      inherited from zroot
zroot/docker  secondarycache        all                      default
zroot/docker  usedbysnapshots       0B                       -
zroot/docker  usedbydataset         52.3G                    -
zroot/docker  usedbychildren        0B                       -
zroot/docker  usedbyrefreservation  0B                       -
zroot/docker  logbias               latency                  default
zroot/docker  objsetid              23086                    -
zroot/docker  dedup                 off                      default
zroot/docker  mlslabel              none                     default
zroot/docker  sync                  standard                 default
zroot/docker  dnodesize             legacy                   default
zroot/docker  refcompressratio      1.66x                    -
zroot/docker  written               52.3G                    -
zroot/docker  logicalused           68.2G                    -
zroot/docker  logicalreferenced     68.2G                    -
zroot/docker  volmode               default                  default
zroot/docker  filesystem_limit      none                     default
zroot/docker  snapshot_limit        none                     default
zroot/docker  filesystem_count      none                     default
zroot/docker  snapshot_count        none                     default
zroot/docker  snapdev               hidden                   default
zroot/docker  acltype               off                      default
zroot/docker  context               none                     default
zroot/docker  fscontext             none                     default
zroot/docker  defcontext            none                     default
zroot/docker  rootcontext           none                     default
zroot/docker  relatime              on                       default
zroot/docker  redundant_metadata    all                      default
zroot/docker  overlay               on                       default
zroot/docker  encryption            off                      default
zroot/docker  keylocation           none                     default
zroot/docker  keyformat             none                     default
zroot/docker  pbkdf2iters           0                        default
zroot/docker  special_small_blocks  0                        default
zroot/docker  snapshots_changed     Mi Mai 22 15:05:15 2024  -
zroot/docker  prefetch              all                      default
zroot/docker  direct                standard                 default
zroot/docker  longname              off                      default

@snajpa
Copy link
Contributor

snajpa commented Nov 23, 2024

awesome, thank you! one more question, how much memory does the machine where you're running the compose have?

@mtippmann
Copy link
Author

$ cat /proc/cmdline
zfs=zroot/arch rw mitigations=off init_on_alloc=0 init_on_free=0 lsm=landlock,lockdown,yama,integrity,apparmor,bpf pcie_aspm=performance systemd.gpt_auto=0 spl.spl_hostid=0x00bab10c

machine has 32gb of memory. It's a kde plasma desktop but it happens reliable here on starting mysql regardless of usage. i've seen anything from segfault anything (basically desktop dying after this) - to some things still work but fail quickly. Logs are also full of process going crap and spewing backtraces.

@mtippmann
Copy link
Author

mtippmann commented Nov 23, 2024

on the other machine (elitedesk) mysql also fails but no segfaults:

db-1       | 2024-11-23 15:03:15+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.4.3-1.el9 started.
db-1       | 2024-11-23 15:03:15+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
db-1       | 2024-11-23 15:03:15+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.4.3-1.el9 started.
db-1       | 2024-11-23 15:03:16+00:00 [Note] [Entrypoint]: Initializing database files
db-1       | 2024-11-23T15:03:16.253134Z 0 [System] [MY-015017] [Server] MySQL Server Initialization - start.
db-1       | 2024-11-23T15:03:16.256437Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.4.3) initializing of server in progress as process 80
db-1       | 2024-11-23T15:03:16.271532Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
db-1       | 2024-11-23T15:03:16.302013Z 1 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
db-1       | 2024-11-23T15:03:16.302090Z 1 [ERROR] [MY-012639] [InnoDB] Write to file ./ibdata1 failed at offset 0, 1048576 bytes should have been written, only 0 were written. Operating system error number 12. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
db-1       | 2024-11-23T15:03:16.302121Z 1 [ERROR] [MY-012640] [InnoDB] Error number 12 means 'Cannot allocate memory'
db-1       | 2024-11-23T15:03:16.302149Z 1 [ERROR] [MY-012267] [InnoDB] Could not set the file size of './ibdata1'. Probably out of disk space
db-1       | 2024-11-23T15:03:16.302167Z 1 [ERROR] [MY-012929] [InnoDB] InnoDB Database creation was aborted with error Generic error. You may need to delete the ibdata1 file before trying to start up again.
db-1       | 2024-11-23T15:03:16.800924Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
db-1       | 2024-11-23T15:03:16.800969Z 0 [ERROR] [MY-013236] [Server] The designated data directory /var/lib/mysql/ is unusable. You can remove all files that the server added to it.
db-1       | 2024-11-23T15:03:16.800983Z 0 [ERROR] [MY-010119] [Server] Aborting
db-1       | 2024-11-23T15:03:16.802524Z 0 [System] [MY-015018] [Server] MySQL Server Initialization - end.
db-1 exited with code 0

disk has 1.4tb free space and 48gb memory. sorry for the initial confusion. it must be somehow related to mysql doing something that zfs doesn't like.

@snajpa
Copy link
Contributor

snajpa commented Nov 23, 2024

OK, reproduced, thank you! will ping you when I have a patch to test

@snajpa
Copy link
Contributor

snajpa commented Nov 25, 2024

got sidetracked by some interesting networking issues, will be back at this in a few days

@mtippmann
Copy link
Author

got sidetracked by some interesting networking issues, will be back at this in a few days

no worries please. thanks for looking into it. I'm wondering if it's related to direct io and this might be an issue that could hit 2.3.0 release?

@snajpa
Copy link
Contributor

snajpa commented Nov 25, 2024

AFAIK you're right, my bet is also on DIO, but it might also be the GPL-only change of zero page or a change related to HAVE_IOV_ITER_GET_PAGES2, or it can also be something entirely different :D looks like I resolved the networking issue in our kernel, so I can get back to this

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

@mtippmann could you please try #16812?

@mtippmann
Copy link
Author

@mtippmann could you please try #16812?

looks good using zfs-kmod-2.3.99-94_g0ffa6f3464 (with patch) and 6.12.1-arch1-1

mysql starts up without complaints. have to test lts und -zen kernel but so far the problem did not appear.

@mtippmann
Copy link
Author

6.6.63-1-lts and 6.12.1.zen1-1 is also fine with init_on_alloc=0

thank you so much!

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Dec 3, 2024
The intent here is to replace the zero page pointer in the array of
pointers to pages in the struct.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pavel Snajdr <[email protected]>
Closes openzfs#16812 
Closes openzfs#16689
Closes openzfs#16642
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants