Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTS: OOM in raidz_002_pos #16566

Closed
tonyhutter opened this issue Sep 24, 2024 · 2 comments · Fixed by #16664
Closed

ZTS: OOM in raidz_002_pos #16566

tonyhutter opened this issue Sep 24, 2024 · 2 comments · Fixed by #16664
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@tonyhutter
Copy link
Contributor

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 40
Kernel Version 6.10
Architecture x86_64
OpenZFS Version

Describe the problem you're observing

Using the new github runners, we're seeing an occasional OOM in functional/raidz/raidz_002_pos. It is killing off the raidz_test program:

Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_002_pos (run as root) [03:30] [FAIL]
08:41:42.14 /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_002_pos.ksh[49]: log_must[70]: log_pos: line 265: 918355: Killed
08:41:42.14 20/176... 40/165... 60/165... 80/165... 100/165... 120/165... ERROR: raidz_test -S -e -t 300 exited 265

raidz_test had allocated 5.3GB of RAM:

Out of memory: Killed process 918355 (raidz_test) total-vm:13275572kB, anon-rss:5306400kB, file-rss:56kB, shmem-rss:0kB, UID:0 pgtables:24564kB oom_score_adj:0
 [ 7605.935208] systemd-userdbd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
  [ 7605.938835] CPU: 1 PID: 708 Comm: systemd-userdbd Tainted: P           OE      6.10.10-200.fc40.x86_64 #1
  [ 7605.941634] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [ 7605.944347] Call Trace:
  [ 7605.945197]  <TASK>
  [ 7605.945978]  dump_stack_lvl+0x5d/0x80
  [ 7605.947381]  dump_header+0x44/0x18d
  [ 7605.948634]  oom_kill_process.cold+0xa/0xaa
  [ 7605.949964]  out_of_memory+0x219/0x4b0
  [ 7605.951262]  __alloc_pages_slowpath.constprop.0+0xb4e/0xe00
  [ 7605.953023]  __alloc_pages_noprof+0x31f/0x350
  [ 7605.954412]  alloc_pages_mpol_noprof+0xd7/0x1e0
  [ 7605.955867]  ? __filemap_get_folio+0x37/0x2e0
  [ 7605.957254]  vma_alloc_folio_noprof+0x63/0xc0
  [ 7605.958667]  ? __swap_duplicate+0xdb/0x190
  [ 7605.960007]  do_swap_page+0x4a9/0xd60
  [ 7605.961215]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 7605.962768]  ? __handle_mm_fault+0x829/0x1080
  [ 7605.964150]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 7605.965656]  ? __pte_offset_map+0x1b/0x180
  [ 7605.966971]  __handle_mm_fault+0x829/0x1080
  [ 7605.968335]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 7605.969820]  ? mt_find+0x21c/0x580
  [ 7605.971016]  handle_mm_fault+0xf0/0x300
  [ 7605.972239]  do_user_addr_fault+0x15d/0x620
  [ 7605.973660]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 7605.975112]  ? asm_exc_page_fault+0x26/0x30
  [ 7605.976458]  exc_page_fault+0x7e/0x180
  [ 7605.977673]  asm_exc_page_fault+0x26/0x30
  [ 7605.978963] RIP: 0010:__get_user_8+0x11/0x20

Full examples:
https://github.com/openzfs/zfs/actions/runs/10978174081/job/30481019124
https://github.com/openzfs/zfs/actions/runs/10998799735/job/30537538603

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

@tonyhutter tonyhutter added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 24, 2024
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Oct 3, 2024
raidz_002_pos can take over 5GB of RAM and will sometimes OOM.
Enable 16GB of swap space to help mitigate this.

Fixes: openzfs#16566
Signed-off-by: Tony Hutter <[email protected]>
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Oct 3, 2024
raidz_002_pos can take over 5GB of RAM and will sometimes OOM.
Enable 16GB of swap space to help mitigate this.

Fixes: openzfs#16566
Signed-off-by: Tony Hutter <[email protected]>
@mcmilk
Copy link
Contributor

mcmilk commented Oct 6, 2024

The raidz_002_pos happens to FreeBSD and Ubuntu also.
Even on VMs with 12 GB RAM the problem happens sometimes :/

@amotin
Copy link
Member

amotin commented Oct 17, 2024

I've just manually run ./raidz_test -S -e -t 300 on FreeBSD and observed it gradually consuming >106GB of RAM before completing successfully. I bet something is leaking there inside the loops, but haven't got what yet.

amotin added a commit to amotin/zfs that referenced this issue Oct 18, 2024
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Fixes openzfs#16492
Fixes openzfs#16566
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Oct 21, 2024
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by:  Rob Norris <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16492
Closes openzfs#16566
Closes openzfs#16664
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Oct 21, 2024
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by:  Rob Norris <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16492
Closes openzfs#16566
Closes openzfs#16664
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants