Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binder benchmark causes kernel BUG on Raspberry Pi #365

Closed
TheSven73 opened this issue Jun 8, 2021 · 28 comments · Fixed by #375
Closed

binder benchmark causes kernel BUG on Raspberry Pi #365

TheSven73 opened this issue Jun 8, 2021 · 28 comments · Fixed by #375
Labels
• bug Related to runtime bugs, panics, warnings...

Comments

@TheSven73
Copy link
Collaborator

@wedsonaf
(this Issue doesn't really belong here, but I can't open an Issue on wedson's fork)

  1. follow rust/kernel: remove config #ifdef in Error.rs file #346 (comment)
  2. build for Raspberry Pi: $ make bcm2835_rust_defconfig then switch on binder in defconfig
  3. Boot kernel on Raspberry Pi Zero
  4. ./ping_server /dev/rust_binder & works great!
  5. ./ping_client /dev/rust_binder oopses:
root@raspberrypi:/home/pi# ./ping_client /dev/rust_binder
[  149.443939] ------------[ cut here ]------------
[  149.446521] kernel BUG at rust/helpers.c:14!
[  149.446812] Internal error: Oops - BUG: 0 [#1] ARM
[  149.447165] Modules linked in:
[  149.447946] CPU: 0 PID: 359 Comm: ping_client Tainted: G        W         5.12.0-rc4+ #8
[  149.448448] Hardware name: BCM2835
[  149.448735] PC is at rust_helper_BUG+0x10/0x14
[  149.449441] LR is at rust_begin_unwind+0x4/0x8
[  149.449634] pc : [<c04631b0>]    lr : [<c0462888>]    psr: 00000013
[  149.449919] sp : c404dcc8  ip : c0b80cec  fp : c404dcc8
[  149.450129] r10: c437c0c0  r9 : 00000000  r8 : 49034000
[  149.450341] r7 : c30bed2c  r6 : c30bed00  r5 : c404de58  r4 : c30bed08
[  149.450542] r3 : 00000000  r2 : c404dcf8  r1 : c0b80cec  r0 : c404dcd0
[  149.450853] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  149.451220] Control: 10c53c7d  Table: 04308059  DAC: 00000051
[  149.451421] Process ping_client (pid: 359, stack limit = 0x(ptrval))
[  149.451793] Stack: (0xc404dcc8 to 0xc404e000)
[  149.452278] dcc0:                   c404ddc0 c0462888 c0b20754 c0b22a98 c404dce0 c0b80cec
[  149.452808] dce0: c404dcf8 00000001 00000000 00000000 c0b20754 00000000 c0b7fc00 00000021
[  149.453659] dd00: 00000000 c30bed00 00000000 c404de58 c404dd90 c404ddb8 c437c0d0 c437c0c0
[  149.454648] dd20: c404ddc0 c0749790 c07461ac c404de28 c404dd50 c0233b20 0000000c c404ddc4
[  149.455561] dd40: c404ddd0 c404ddcc c404de84 c404de38 c404de30 c404de14 c404de04 c404de5c
[  149.456404] dd60: c404de60 c404dd90 c404dda8 00000001 0000004c c437c0c0 bea503f8 00000000
[  149.457521] dd80: bea503c8 00000030 0000004c 00000000 00000000 00000000 00021048 00000000
[  149.458592] dda0: 00000800 00000000 00000000 00000000 bea503f8 00000000 00021050 00000044
[  149.459703] ddc0: 00000024 c404de28 00000001 c073b3f8 c437cec0 c404de28 c401f960 00000000
[  149.460790] dde0: 00000001 c073c1fc 00000000 c074b2e0 00000000 c30bed08 00000001 c074b2e0
[  149.461852] de00: 00000002 c30bed00 c437cec0 00000001 00000001 c126662c c437cd80 00000001
[  149.462407] de20: c437cd80 00000001 c437cec0 00000000 00000000 c0248394 c401f960 c401f960
[  149.462842] de40: c401f960 c401f960 c16ee780 c401f840 c404de60 c02489a8 00000000 00000030
[  149.463289] de60: 0000004c 00000000 00000000 00000000 00021048 00000000 00000800 00000000
[  149.463822] de80: 00000000 00000000 bea503f8 00000000 c405bb40 00000000 f9cf40ab fffffff4
[  149.464278] dea0: 00000000 c401f840 00000000 c404c000 c401f854 c405bb40 c21b3000 00000001
[  149.464708] dec0: 00000071 00000001 ffffffea c0306201 00000001 c30bed00 c405bb40 c405bb40
[  149.465170] dee0: c404dfa0 c0733f80 00000030 c30bed00 c404df24 00000030 c405bb40 c30bed00
[  149.465742] df00: c404df10 c0306201 bea503c8 c0738104 c0306201 bea503c8 00000000 bea503c8
[  149.466167] df20: 00000030 c405bb40 00000003 c16ee690 c404df60 c028d42c 00000006 00001000
[  149.466691] df40: 00000001 c404df44 c404df44 00000000 f9cf40ab c405bb40 00000000 00000000
[  149.467298] df60: 00000002 c0248668 c405bb40 b6fcc000 c404dfa0 c0248668 00000002 00000000
[  149.467844] df80: f9cf40ab bea503f8 00000000 0001059d 00000036 c0100264 c404c000 00000036
[  149.468396] dfa0: 00000000 c0100060 bea503f8 00000000 00000003 c0306201 bea503c8 bea503c8
[  149.468864] dfc0: bea503f8 00000000 0001059d 00000036 00021048 00000000 b6fd0000 00000000
[  149.469355] dfe0: b6ef6510 bea503b4 0001086b b6ef651c 60000010 00000003 00000000 00000000
[  149.470240] [<c04631b0>] (rust_helper_BUG) from [<c0462888>] (rust_begin_unwind+0x4/0x8)
[  149.470959] Code: e92d4800 e1a0b00d e92d4000 e8bd4000 (e7f001f2) 
[  149.471647] ---[ end trace faab23bf7569842d ]---
@TheSven73 TheSven73 added the • bug Related to runtime bugs, panics, warnings... label Jun 8, 2021
@wedsonaf
Copy link

wedsonaf commented Jun 8, 2021

Thanks Sven!

Could you try it on the C version of binder?

(Either way we should not allow a user space process to affect the kernel this way, but as a data point, it would be good to know if the ping client/server work on the C version of binder; I believe this is the first time they run on a 32-bit arch.)

@TheSven73
Copy link
Collaborator Author

First time I ran the client, I get "illegal instruction"?

root@raspberrypi:/home/pi# ./ping_client /dev/binder 
[1]+  Illegal instruction     ./ping_server /dev/binder
Illegal instruction

Afterwards, I do get some kind of failure:

root@raspberrypi:/home/pi# ./ping_client /dev/binder 
[  125.409426] binder: 331:331 BC_FREE_BUFFER u4040630000000000 no match
[  125.411598] binder: 331:331 unknown command 0
[  125.411964] binder: 331:331 ioctl c0306201 bea2e3e8 returned -22
ioctl: Invalid argument

Maybe I need to switch on selftests in defconfig?

@TheSven73
Copy link
Collaborator Author

When selftest is activated in defconfig, C binder oopses when the server is started:

root@raspberrypi:/home/pi# ./ping_server /dev/binder &
[1] 329
root@raspberrypi:/home/pi# [  112.214879] binder_alloc_selftest: STARTED
[  112.250097] binder_alloc: 329: binder_alloc_buf size 2560 failed, no address space
[  112.251679] binder_alloc: allocated: 2048 (num: 4 largest: 512), free: 2048 (num: 1 largest: 2048)
[  112.252006] binder_alloc_selftest: alloc sizes: [512][512][512][512][2560]
[  112.252621] binder_alloc_selftest: free seq: [0][1][2][3][4]
[  112.253182] 8<--- cut here ---
[  112.253369] Unable to handle kernel paging request at virtual address fffffff8
[  112.253643] pgd = (ptrval)
[  112.253815] [fffffff8] *pgd=2bfde861, *pte=00000000, *ppte=00000000
[  112.254342] Internal error: Oops: 37 [#1] ARM
[  112.254619] Modules linked in:
[  112.254893] CPU: 0 PID: 329 Comm: ping_server Tainted: G        W         5.12.0-rc4+ #3
[  112.255170] Hardware name: BCM2835
[  112.255317] PC is at binder_alloc_free_buf+0x18/0x5c
[  112.255779] LR is at binder_selftest_free_buf+0x3c/0xdc
[  112.255970] pc : [<c0742eec>]    lr : [<c0744ed4>]    psr: 20000013
[  112.256150] sp : c4049a30  ip : c1028500  fp : c4049a38
[  112.256309] r10: 00001200  r9 : c4049c34  r8 : 00001200
[  112.256505] r7 : c4049a74  r6 : c237cf2c  r5 : ffffffe4  r4 : c237cf2c
[  112.256710] r3 : 00000000  r2 : c2198000  r1 : ffffffe4  r0 : c237cf2c
[  112.256968] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  112.257149] Control: 10c53c7d  Table: 04064059  DAC: 00000051
[  112.257362] Process ping_server (pid: 329, stack limit = 0x(ptrval))
[  112.257553] Stack: (0xc4049a30 to 0xc404a000)
[  112.257808] 9a20:                                     c4049c48 fffffffc c4049a60 c0744ed4
[  112.258305] 9a40: c4049a74 c237cf2c c4049c48 c4049a74 00000005 c4049c34 00000004 00001200
[  112.258696] 9a60: c4049aa8 c0744bb8 00001200 c0321970 00000000 c3252d40 c3252c40 c3252a00
[  112.259045] 9a80: c3252c80 ffffffe4 9665fd41 c237cf2c c4049c48 00000005 00000004 c4049c34
[  112.259404] 9aa0: 00000004 00001200 c4049af0 c0744d20 00001200 00000000 00000001 c3252cc0
[  112.259951] 9ac0: 00000000 c1571000 00000000 00000001 9665fd41 c237cf2c c4049c48 00000004
[  112.260545] 9ae0: 00000003 c4049c34 00000003 00001200 c4049b38 c0744d20 00001200 00000000
[  112.261195] 9b00: 00000000 c1710500 c4049b60 c04a3854 c4049b8c c1798000 9665fd41 c237cf2c
[  112.261906] 9b20: c4049c48 00000003 00000002 c4049c34 00000002 00001200 c4049b80 c0744d20
[  112.262669] 9b40: 00001200 c17a3a30 9665fd41 c1168dec c1169698 c02562d0 c1168dec 00000801
[  112.263405] 9b60: 9665fd41 c237cf2c c4049c48 00000002 00000001 c4049c34 00000001 00001200
[  112.264162] 9b80: c4049bc8 c0744d20 00001200 00000000 00000001 00000000 00000801 c1168dec
[  112.264819] 9ba0: 00000000 00002800 9665fd41 c237cf2c c4049c48 00000001 00000000 c4049c34
[  112.265427] 9bc0: 00000000 00001200 c4049c10 c0744d20 00001200 60000013 c4049bf4 00000000
[  112.265971] 9be0: c4049c28 c01581e0 00000000 c4049bf4 9665fd41 fffffffb c4049c48 00000005
[  112.266673] 9c00: 00000005 fffff000 00001200 c237cf2c c4049c80 c0744ab8 00001200 00000000
[  112.267228] 9c20: 00000800 00000200 00000200 00000200 00000200 00000200 00000200 00000200
[  112.267903] 9c40: 00000200 00000a00 00000000 00000001 00000002 00000003 00000004 c0292f58
[  112.268462] 9c60: 9665fd41 00000002 c4049eb8 00000004 00000005 fffff000 00001200 c237cf2c
[  112.269068] 9c80: c4049cf0 c0744b18 0003475f 00000000 0000000a 00000006 00000001 00000000
[  112.269702] 9ca0: 34383739 c02c3231 22921900 c0165cc0 c1010488 00000000 00000192 00000000
[  112.270280] 9cc0: c4049d4c c1004248 c4049d00 c0165cc0 9665fd41 00000000 c4049eb8 00000003
[  112.270843] 9ce0: 00000004 fffff000 00000800 c237cf2c c4049d60 c0744b18 c4049d40 00000000
[  112.271442] 9d00: c4049d78 c0165728 00000000 c4049d60 c1010484 00000400 c4049d50 c101049c
[  112.272028] 9d20: c4049d30 00000000 c4049d80 00000000 00000000 00000192 c1028500 00000000
[  112.272541] 9d40: 9665fd41 00000000 c4049eb8 00000002 00000003 fffff000 00000600 c237cf2c
[  112.273079] 9d60: c4049dd0 c0744b18 c1004248 c11db340 c1010484 c11db348 c4049da0 c01654f8
[  112.273700] 9d80: 00000192 00000000 60000093 9665fd41 c4049db4 00000000 00000000 c0162600
[  112.274339] 9da0: c4049e38 c0162600 00000000 00000000 9665fd41 00000000 c4049eb8 00000001
[  112.274872] 9dc0: 00000002 fffff000 00000400 c237cf2c c4049e40 c0744b18 00000000 00000000
[  112.275434] 9de0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  112.276043] 9e00: 00000000 00000000 00000000 00000000 00000000 00000000 9665fd41 0000001e
[  112.276645] 9e20: 9665fd41 00000000 c4049eb8 00000000 00000001 fffff000 00000200 c237cf2c
[  112.277166] 9e40: c4049eb0 c0744b18 c16ef780 c4057cc0 00000000 c4049eac c0d1e027 c123ec28
[  112.277744] 9e60: 40046207 00000000 40046207 c237ce00 c4049e88 c0162764 c4049eac c4057780
[  112.278301] 9e80: c0d1e027 c1004248 c4049ea4 c09f1b2c 9665fd41 c237cf2c c1004248 c123ec28
[  112.278892] 9ea0: 40046207 00000000 40046207 c237ce00 c4049ee0 c07449b4 00000200 00000400
[  112.279393] 9ec0: 00000600 00000800 00001200 9665fd41 c4044c00 c16ef690 00000000 c237ce00
[  112.279920] 9ee0: c4049f30 c0734974 00000000 c4049f44 9665fd41 00000002 00001000 00000001
[  112.280352] 9f00: c4044c00 c3287c40 c4044c00 fffffffc 9665fd41 00000003 c16ef690 c4049f60
[  112.280934] 9f20: 40046207 00000000 c4044c00 c4044c00 c4049fa0 c028d42c 00000006 00001000
[  112.281449] 9f40: 00000001 c4049f44 c4049f44 00000000 9665fd41 c4044c00 00000000 00000000
[  112.281905] 9f60: 00000002 c0248668 c4044c00 b6f79000 c4049fa0 c0248668 00000002 00000000
[  112.282357] 9f80: 9665fd41 00000000 000108e9 0001050d 00000036 c0100264 c4048000 00000036
[  112.282979] 9fa0: 00000000 c0100060 00000000 000108e9 00000003 40046207 00000000 b6f79000
[  112.283553] 9fc0: 00000000 000108e9 0001050d 00000036 00000000 00000000 b6f7d000 00000000
[  112.284028] 9fe0: b6ea3510 bece13b4 00010781 b6ea351c 60000010 00000003 00000000 00000000
[  112.284890] [<c0742eec>] (binder_alloc_free_buf) from [<c0744ed4>] (binder_selftest_free_buf+0x3c/0xdc)
[  112.285501] [<c0744ed4>] (binder_selftest_free_buf) from [<c0744bb8>] (binder_selftest_free_seq+0x6c/0x228)
[  112.285852] [<c0744bb8>] (binder_selftest_free_seq) from [<c0744d20>] (binder_selftest_free_seq+0x1d4/0x228)
[  112.286182] [<c0744d20>] (binder_selftest_free_seq) from [<c0744d20>] (binder_selftest_free_seq+0x1d4/0x228)
[  112.286523] [<c0744d20>] (binder_selftest_free_seq) from [<c0744d20>] (binder_selftest_free_seq+0x1d4/0x228)
[  112.286870] [<c0744d20>] (binder_selftest_free_seq) from [<c0744d20>] (binder_selftest_free_seq+0x1d4/0x228)
[  112.287216] [<c0744d20>] (binder_selftest_free_seq) from [<c0744d20>] (binder_selftest_free_seq+0x1d4/0x228)
[  112.287530] [<c0744d20>] (binder_selftest_free_seq) from [<c0744ab8>] (binder_selftest_alloc_offset+0xb8/0x14c)
[  112.287893] [<c0744ab8>] (binder_selftest_alloc_offset) from [<c0744b18>] (binder_selftest_alloc_offset+0x118/0x14c)
[  112.288300] [<c0744b18>] (binder_selftest_alloc_offset) from [<c0744b18>] (binder_selftest_alloc_offset+0x118/0x14c)
[  112.288774] [<c0744b18>] (binder_selftest_alloc_offset) from [<c0744b18>] (binder_selftest_alloc_offset+0x118/0x14c)
[  112.289190] [<c0744b18>] (binder_selftest_alloc_offset) from [<c0744b18>] (binder_selftest_alloc_offset+0x118/0x14c)
[  112.289685] [<c0744b18>] (binder_selftest_alloc_offset) from [<c0744b18>] (binder_selftest_alloc_offset+0x118/0x14c)
[  112.290284] [<c0744b18>] (binder_selftest_alloc_offset) from [<c07449b4>] (binder_selftest_alloc+0x8c/0xd8)
[  112.290716] [<c07449b4>] (binder_selftest_alloc) from [<c0734974>] (binder_ioctl+0x38/0x864)
[  112.291053] [<c0734974>] (binder_ioctl) from [<c028d42c>] (sys_ioctl+0x908/0xa50)
[  112.291488] [<c028d42c>] (sys_ioctl) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
[  112.291873] Exception stack(0xc4049fa8 to 0xc4049ff0)
[  112.292289] 9fa0:                   00000000 000108e9 00000003 40046207 00000000 b6f79000
[  112.292848] 9fc0: 00000000 000108e9 0001050d 00000036 00000000 00000000 b6f7d000 00000000
[  112.293246] 9fe0: b6ea3510 bece13b4 00010781 b6ea351c
[  112.293766] Code: e1a05001 e1a04000 e92d4000 e8bd4000 (e5d50014) 
[  112.294795] ---[ end trace 86083b9ff3cb8f93 ]---

@TheSven73
Copy link
Collaborator Author

I apologize for running this on 32-bit architecture (which is not that important in the grand scheme of things). However, the real-world devices I have and am most familiar with, are 32-bit arm.

@ojeda
Copy link
Member

ojeda commented Jun 8, 2021

Finding and reporting bugs is never something to apologize for! :)

@TheSven73
Copy link
Collaborator Author

TheSven73 commented Jun 8, 2021

Yes, but I think this BUG is probably a distraction from getting this benchmark to run.
I think the benchmark is much more important than a 32-bit binder bug.

(edited: I am now going to try on a 64-bit arm Raspberry Pi 3)

@TheSven73
Copy link
Collaborator Author

I tried on a 64-bit arm Raspberry Pi 3:

pi@raspberrypi:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
Model:               4
Model name:          Cortex-A53
Stepping:            r0p4
BogoMIPS:            125.00
NUMA node0 CPU(s):   0-3
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
pi@raspberrypi:~$ ./ping_server /dev/binder &
pi@raspberrypi:~$ ./ping_client /dev/binder 
ioctl: Invalid argument

I get the same result with /dev/rust_binder. Not sure what's going on.

@TheSven73
Copy link
Collaborator Author

If it helps, here are the results of the C binder selftest on arm64:

[   87.339991] binder_alloc_selftest: STARTED
[   87.362712] binder_alloc: 386: binder_alloc_buf size 2560 failed, no address space
[   87.362796] binder_alloc: allocated: 2048 (num: 4 largest: 512), free: 2048 (num: 1 largest: 2048)
[   87.362825] binder_alloc_selftest: alloc sizes: [512][512][512][512][2560]
[   87.363866] binder_alloc_selftest: free seq: [0][1][2][3][4]
[   87.364361] Unable to handle kernel NULL pointer dereference at virtual address 000000000000000c
[   87.365204] Mem abort info:
[   87.365275]   ESR = 0x96000007
[   87.365373]   EC = 0x25: DABT (current EL), IL = 32 bits
[   87.365416]   SET = 0, FnV = 0
[   87.365443]   EA = 0, S1PTW = 0
[   87.365481] Data abort info:
[   87.365509]   ISV = 0, ISS = 0x00000007
[   87.365552]   CM = 0, WnR = 0
[   87.365636] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000005d9a000
[   87.365698] [000000000000000c] pgd=00000000057cb003, p4d=00000000057cb003, pud=0000000005da9003, pmd=0000000005dac003, pte=0000000000000000
[   87.365991] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[   87.366179] Modules linked in:
[   87.366406] CPU: 1 PID: 386 Comm: ping_server Tainted: G        W         5.12.0-rc4-698413-g76001a4dd2a8-dirty #3
[   87.366581] Hardware name: Raspberry Pi 3 Model A+ (DT)
[   87.366742] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   87.366813] pc : binder_alloc_free_buf+0x18/0xf8
[   87.367194] lr : binder_selftest_free_buf+0x80/0x12c
[   87.367332] sp : ffff800011f734d0
[   87.367383] x29: ffff800011f734d0 x28: ffff0000056b0e00 
[   87.367614] x27: 0000000000000000 x26: 0000000000000004 
[   87.367772] x25: 0000000000000004 x24: 0000000000000004 
[   87.367860] x23: ffff800011f73580 x22: 0000000000001200 
[   87.367920] x21: ffff000005d4dd78 x20: ffff800011f738f8 
[   87.367987] x19: ffff800011f73920 x18: ffffffffffffffff 
[   87.368113] x17: 0000000000000000 x16: 0000000000000000 
[   87.368190] x15: 0000000000000004 x14: 0000000000000fff 
[   87.368252] x13: fffffc0000000000 x12: 0000000000000000 
[   87.368303] x11: ffff000005d4ddc0 x10: 0000000000000000 
[   87.368370] x9 : 0000000000000000 x8 : 0000000000000004 
[   87.368439] x7 : 3a71657320656572 x6 : ffff800011ba3f29 
[   87.368513] x5 : 0000000000000001 x4 : ffff000037dbfea0 
[   87.368600] x3 : 0000000000581401 x2 : 0000000000000001 
[   87.368679] x1 : ffffffffffffffe4 x0 : ffff000005d4dd78 
[   87.368846] Call trace:
[   87.368999]  binder_alloc_free_buf+0x18/0xf8
[   87.369114]  binder_selftest_free_buf+0x80/0x12c
[   87.369191]  binder_selftest_free_seq+0x70/0x498
[   87.369276]  binder_selftest_free_seq+0x3cc/0x498
[   87.369317]  binder_selftest_free_seq+0x370/0x498
[   87.369356]  binder_selftest_free_seq+0x314/0x498
[   87.369409]  binder_selftest_free_seq+0x2b8/0x498
[   87.369450]  binder_selftest_free_seq+0x25c/0x498
[   87.369505]  binder_selftest_alloc_offset+0xa0/0x18c
[   87.369557]  binder_selftest_alloc_offset+0x120/0x18c
[   87.369608]  binder_selftest_alloc_offset+0xec/0x18c
[   87.369663]  binder_selftest_alloc_offset+0xec/0x18c
[   87.369712]  binder_selftest_alloc_offset+0xec/0x18c
[   87.369775]  binder_selftest_alloc_offset+0xec/0x18c
[   87.369830]  binder_selftest_alloc+0x9c/0xd8
[   87.369900]  binder_ioctl+0x48/0xbd8
[   87.369952]  compat_ptr_ioctl+0x20/0x34
[   87.369992]  __arm64_compat_sys_ioctl+0x104/0x158
[   87.370047]  el0_svc_common+0x90/0x110
[   87.370135]  do_el0_svc_compat+0x1c/0x40
[   87.370195]  el0_svc_compat+0x28/0x88
[   87.370232]  el0_sync_compat_handler+0x78/0xb4
[   87.370278]  el0_sync_compat+0x174/0x180
[   87.370553] Code: a9015ff8 a90257f6 a9034ff4 910003fd (b9402828) 
[   87.371002] ---[ end trace 02e55523d13b1f45 ]---

@nbdd0121
Copy link
Member

nbdd0121 commented Jun 8, 2021

Cause of BUG:

panicked at 'attempt to subtract with overflow', drivers/android/process.rs:548:22

EDIT: replacing the subtraction with wrapping_sub does not work.

@wedsonaf
Copy link

wedsonaf commented Jun 8, 2021

If it helps, here are the results of the C binder selftest on arm64:

C binder has same self tests that run in the kernel and they seem to be failing. I think it's better to disable them for now.

@wedsonaf
Copy link

wedsonaf commented Jun 8, 2021

I tried on a 64-bit arm Raspberry Pi 3:

pi@raspberrypi:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
Model:               4
Model name:          Cortex-A53
Stepping:            r0p4
BogoMIPS:            125.00
NUMA node0 CPU(s):   0-3
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
pi@raspberrypi:~$ ./ping_server /dev/binder &
pi@raspberrypi:~$ ./ping_client /dev/binder 
ioctl: Invalid argument

I get the same result with /dev/rust_binder. Not sure what's going on.

I have seen this before, in my case it had to do with different alignment when compiling structs in kernel and user spaces (due to different compiler settings/configs). Since the size of structs is encoded as part of the ioctl number, it would result in ioctl numbers that are unknown to the kernel.

Let me take a look.

@TheSven73
Copy link
Collaborator Author

I can confirm that nothing changes on arm or arm64 after applying #366 :(

@wedsonaf
Copy link

wedsonaf commented Jun 8, 2021

Commit wedsonaf@5baa6d7 fixes an underflow bug.(I haven't yet spent any time trying to understand why it doesn't trigger on x86-64.)

With it I can run the benchmark on aarch64 qemu (emulated), on both versions of Binder.

https://github.com/wedsonaf/linux/tree/ping has other commits: static building (to run on a stripped-down busybox build) and specifying the number of iterations (given it's emulated, 1M iterations was taking too long).

@TheSven73 would be able to try this? If it still fails (and it seems like it should, given that you were getting a different error), would you be able to run your image on qemu to see if it works there? I'd like to understand if real hardware is behaving differently from qemu and exposing another bug.

@TheSven73
Copy link
Collaborator Author

TheSven73 commented Jun 9, 2021

@wedsonaf it's working now for arm 32 and 64 bit.

The 32-bit "Illegal Instruction" was an unrelated issue - the defaults of my arm toolchain are too "advanced" for the Rasperry Pi Zero's arm1176 cpu. Once I tweaked that, the ping test came to life.

On arm64, I was getting ioctl: invalid argument because I was targeting 32-bit arm executables for Raspbian (Raspbian Buster userspace is 32-bit only). After your -static patch, I could easily target aarch64. Then the ping test came to life there as well.

So all good 💯 thanks so much for the awesome fix 👍

I've probably ran out of time for the benchmarks, will defer that until tomorrow.

(ETA: it's probably a bug that a 32-bit userspace binder client can't interface with a 64-bit binder kernel, but I'm guessing no-one at Google is using, or planning to use that combo. so all good.)

@wedsonaf
Copy link

wedsonaf commented Jun 9, 2021

@wedsonaf it's working now for arm 32 and 64 bit.

The 32-bit "Illegal Instruction" was an unrelated issue - the defaults of my arm toolchain are too "advanced" for the Rasperry Pi Zero's arm1176 cpu. Once I tweaked that, the ping test came to life.

On arm64, I was getting ioctl: invalid argument because I was targeting 32-bit arm executables for Raspbian (Raspbian Buster userspace is 32-bit only). After your -static patch, I could easily target aarch64. Then the ping test came to life there as well.

So all good 💯 thanks so much for the awesome fix 👍

Thanks for the update, Sven! Great to hear they seem to be working now.

I've probably ran out of time for the benchmarks, will defer that until tomorrow.

No rush.

(ETA: it's probably a bug that a 32-bit userspace binder client can't interface with a 64-bit binder kernel, but I'm guessing no-one at Google is using, or planning to use that combo. so all good.)

I do care :)

I think it's related to the alignment/size of structs used in ioctls. Even on 32-bit builds, Binder expects "pointers" (binder_uintptr_t) to be encoded with 64 bits. I will try 32-bit binaries on 64-bit kernels, probably tomorrow.

@nbdd0121 whenever you find the time, I'd appreciate if you could try this latest version too.

(Given the variety of archs you all have, I'm eager to finish my functional test conversion so that we get more coverage :))

@nbdd0121
Copy link
Member

nbdd0121 commented Jun 9, 2021

It works now.

@wedsonaf
Copy link

wedsonaf commented Jun 9, 2021

It works now.

Thanks for trying it out! Much appreciated.

@TheSven73
Copy link
Collaborator Author

So now the $1e6 question - what are we going to use the benchmark for ?? :)

On my end, I will benchmark arm 32-bit - and check if the mutex #ifdef or NonNull patches make any difference to the binder ping test. Unless someone has a better suggestion.

@wedsonaf
Copy link

FYI, the reason I wasn't seeing this on x86-64 was that I didn't have CONFIG_RUST_OVERFLOW_CHECKS set. Probably because I created by .config before the default change in f6fc80f.

After I enabled it, I could reproduce the issue even on x86-64.

@TheSven73
Copy link
Collaborator Author

TheSven73 commented Jun 10, 2021

Perhaps this provides an excellent opportunity to consider the question of what would happen if we are in mainline, a Rust driver has an inadvertent overflow, and mushroom-clouds the whole kernel. Especially in light of "call BUG() when not absolutely necessary, and the fiery wrath of Linus Torvalds himself will descend on you" :)

(ETA of course that should go into #283, and is a discussion for another day)

@bjorn3
Copy link
Member

bjorn3 commented Jun 10, 2021

In production I would assume that CONFIG_RUST_OVERFLOW_CHECKS is not set unless you really need to play safe and also for example enable the kernel.panic_on_oops option. If CONFIG_RUST_OVERFLOW_CHECKS is not set, Rust behaves the same as C except that signed overflow is not UB, but wraps just like unsigned overflow.

@TheSven73
Copy link
Collaborator Author

Kees Cook indicated that the production default should be saturation followed by a call to WARN(). Except if overridden explicitly.

@alex
Copy link
Member

alex commented Jun 10, 2021 via email

@ojeda
Copy link
Member

ojeda commented Jun 10, 2021

FYI, the reason I wasn't seeing this on x86-64 was that I didn't have CONFIG_RUST_OVERFLOW_CHECKS set. Probably because I created by .config before the default change in f6fc80f.

After I enabled it, I could reproduce the issue even on x86-64.

I was wondering about that -- good to know it works as expected!

@bjorn3
Copy link
Member

bjorn3 commented Jun 10, 2021

Saturation is not an option, since that's not what Rust itself -- the
choices are two's complement overflow or a panic.

I think an option to call a certain function on overflow and then return the two's complement overflow anyway would make sense to have in rust. This option will likely need to be unsafe though as the function may be called in the middle of unsafe code that temporarily broke a safety invariant. If the overflow function behaves nicely this option is indistinguishable from wrapping around on overflow from the perspective of the code that caused the overflow and as such shouldn't be a breaking change.

@ojeda
Copy link
Member

ojeda commented Jun 10, 2021

Yeah, it should be doable even if Rust does not provide it right now. The option could even be two orthogonal ones:

  • Whether to call a given function on any overflow.
  • What the result in case of overflow should be.

Or even let the function called return the value to use on overflow.

@bjorn3
Copy link
Member

bjorn3 commented Jun 10, 2021

Or even let the function called return the value to use on overflow.

That would break existing code that assumes that overflows either panic or wrap as specified by RFC 560.

@ojeda
Copy link
Member

ojeda commented Jun 10, 2021

Yeah, it would break existing code in the sense that one cannot blindly use the new mode, but it would not break existing projects, i.e. the ones that already use the existing modes.

ojeda pushed a commit that referenced this issue Apr 29, 2024
The delete set command does not rely on the transaction object for
element removal, therefore, a combination of delete element + delete set
from the abort path could result in restoring twice the refcount of the
mapping.

Check for inactive element in the next generation for the delete element
command in the abort path, skip restoring state if next generation bit
has been already cleared. This is similar to the activate logic using
the set walk iterator.

[ 6170.286929] ------------[ cut here ]------------
[ 6170.286939] WARNING: CPU: 6 PID: 790302 at net/netfilter/nf_tables_api.c:2086 nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287071] Modules linked in: [...]
[ 6170.287633] CPU: 6 PID: 790302 Comm: kworker/6:2 Not tainted 6.9.0-rc3+ #365
[ 6170.287768] RIP: 0010:nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287886] Code: df 48 8d 7d 58 e8 69 2e 3b df 48 8b 7d 58 e8 80 1b 37 df 48 8d 7d 68 e8 57 2e 3b df 48 8b 7d 68 e8 6e 1b 37 df 48 89 ef eb c4 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f
[ 6170.287895] RSP: 0018:ffff888134b8fd08 EFLAGS: 00010202
[ 6170.287904] RAX: 0000000000000001 RBX: ffff888125bffb28 RCX: dffffc0000000000
[ 6170.287912] RDX: 0000000000000003 RSI: ffffffffa20298ab RDI: ffff88811ebe4750
[ 6170.287919] RBP: ffff88811ebe4700 R08: ffff88838e812650 R09: fffffbfff0623a55
[ 6170.287926] R10: ffffffff8311d2af R11: 0000000000000001 R12: ffff888125bffb10
[ 6170.287933] R13: ffff888125bffb10 R14: dead000000000122 R15: dead000000000100
[ 6170.287940] FS:  0000000000000000(0000) GS:ffff888390b00000(0000) knlGS:0000000000000000
[ 6170.287948] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6170.287955] CR2: 00007fd31fc00710 CR3: 0000000133f60004 CR4: 00000000001706f0
[ 6170.287962] Call Trace:
[ 6170.287967]  <TASK>
[ 6170.287973]  ? __warn+0x9f/0x1a0
[ 6170.287986]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092]  ? report_bug+0x1b1/0x1e0
[ 6170.287986]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092]  ? report_bug+0x1b1/0x1e0
[ 6170.288104]  ? handle_bug+0x3c/0x70
[ 6170.288112]  ? exc_invalid_op+0x17/0x40
[ 6170.288120]  ? asm_exc_invalid_op+0x1a/0x20
[ 6170.288132]  ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288243]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288366]  ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288483]  nf_tables_trans_destroy_work+0x588/0x590 [nf_tables]

Fixes: 5910544 ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
ojeda pushed a commit that referenced this issue Apr 29, 2024
…ent path

Check for table dormant flag otherwise netdev release event path tries
to unregister an already unregistered hook.

[524854.857999] ------------[ cut here ]------------
[524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260
[...]
[524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365
[524854.858869] Workqueue: netns cleanup_net
[524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260
[524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41
[524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246
[524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a
[524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438
[524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34
[524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005
[524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00
[524854.858971] FS:  0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[524854.858982] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0
[524854.859000] Call Trace:
[524854.859006]  <TASK>
[524854.859013]  ? __warn+0x9f/0x1a0
[524854.859027]  ? __nf_unregister_net_hook+0x21a/0x260
[524854.859044]  ? report_bug+0x1b1/0x1e0
[524854.859060]  ? handle_bug+0x3c/0x70
[524854.859071]  ? exc_invalid_op+0x17/0x40
[524854.859083]  ? asm_exc_invalid_op+0x1a/0x20
[524854.859100]  ? __nf_unregister_net_hook+0x6a/0x260
[524854.859116]  ? __nf_unregister_net_hook+0x21a/0x260
[524854.859135]  nf_tables_netdev_event+0x337/0x390 [nf_tables]
[524854.859304]  ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859461]  ? packet_notifier+0xb3/0x360
[524854.859476]  ? _raw_spin_unlock_irqrestore+0x11/0x40
[524854.859489]  ? dcbnl_netdevice_event+0x35/0x140
[524854.859507]  ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859661]  notifier_call_chain+0x7d/0x140
[524854.859677]  unregister_netdevice_many_notify+0x5e1/0xae0

Fixes: d54725c ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
• bug Related to runtime bugs, panics, warnings...
Development

Successfully merging a pull request may close this issue.

6 participants