Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net_buf reference count not protected #32564

Closed
chyu313 opened this issue Feb 23, 2021 · 16 comments
Closed

net_buf reference count not protected #32564

chyu313 opened this issue Feb 23, 2021 · 16 comments
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale

Comments

@chyu313
Copy link
Contributor

chyu313 commented Feb 23, 2021

Describe the bug
I tested heavy TCP traffic load by 32 connections sending data to 32 tcp echo server threads on MIMXRT1060_EVK board, and at the same time using 'ping -f' to make zephyr busy handling ICMP packet. After one day, system crashed as following:
image

I tested many times, and there was a time the system halted due to assertion in k_spin_unlock() saying "Not my spinlock %p" in which the spin lock pointer pointed to _net_buf_pool_area. I didn't capture the error message at that time, but I tried to use atomic operation to protect reference count of net_buf by referencing 5ef825f committed by @jukkar . I changed 'uint8_t ref' in 'struct net_buf' to 'atomic_t atomic_ref', and simply used atomic_inc(&buf->atomic_ref) to replace original 'buf->ref++', atomic_dec(&buf->atomic_ref) to replace original 'buf->ref--', atomic_get(&frag->atomic_ref) to replace original code reading reference count.

Though not knowing why this modification can solve my problem, but after protecting net_buf reference count using atomic operation, my test has run for 3 days and keeps running.

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK
  • Commit: Zephyr v2.5.0
@chyu313 chyu313 added the bug The issue is a bug, or the PR is fixing a bug label Feb 23, 2021
@jukkar
Copy link
Member

jukkar commented Feb 23, 2021

I changed 'uint8_t ref' in 'struct net_buf' to 'atomic_t atomic_ref', and simply used atomic_inc(&buf->atomic_ref) to replace original 'buf->ref++', atomic_dec(&buf->atomic_ref) to replace original 'buf->ref--', atomic_get(&frag->atomic_ref) to replace original code reading reference count.

This change would be welcomed and it most probably fixes your issue. Could you create a PR for this change?

@chyu313
Copy link
Contributor Author

chyu313 commented Feb 23, 2021

I changed 'uint8_t ref' in 'struct net_buf' to 'atomic_t atomic_ref', and simply used atomic_inc(&buf->atomic_ref) to replace original 'buf->ref++', atomic_dec(&buf->atomic_ref) to replace original 'buf->ref--', atomic_get(&frag->atomic_ref) to replace original code reading reference count.

This change would be welcomed and it most probably fixes your issue. Could you create a PR for this change?

OK, but as a newbie, I am reading Contribution Guidelines and then will create a PR.

@jhedberg
Copy link
Member

I think the main reason it's a uint8_t is that it creates a smaller struct size. We should consider solving the same by adding irq_lock/unlock or similar to the appropriate places in the implementation.

@mniestroj
Copy link
Member

@jhedberg we would need spinlock in order to support SMP. Either global for all net_bufs or one per net_buf, which would consume more space than atomic.

Just to confirm, was using net_buf supposed to be thread-safe? There are not so many places where net_buf_ref() is actually used...

I am uncertain that using atomic reference counting is actually solving described bug. TCP application should operate on net_pkts underneath. net_pkts have their own reference counting and net_buf is only unrefed once after net_pkt reference count reaches zero. In other words, I don't see who is calling net_buf_unref second time, that results in race condition.

@chyu313
Copy link
Contributor Author

chyu313 commented Feb 24, 2021

Unfortunately my system crashed on the 4th day of test with messy text dumped on debug console:
image

I wrote a modified net_pkt_print() to monitor the usage of net_buf

void _net_pkt_print(void)
{
	printk("[%u] TX %4u(%p) RX %4u(%p) TXBUF %4u RXBUF %4u\r",
	        k_uptime_get_32(),
		k_mem_slab_num_free_get(&tx_pkts), tx_pkts.free_list,
		k_mem_slab_num_free_get(&rx_pkts), rx_pkts.free_list,
		atomic_get(&tx_bufs.avail_count),
		atomic_get(&rx_bufs.avail_count));
}

In my test described above, net_pkt count was set to 1024 and net_buf count was set to 4096 which is a large value. To verify whether there is something wrong with net_buf allocation/free, I reduced net_buf count on purpose:

CONFIG_NET_PKT_RX_COUNT=1024
CONFIG_NET_PKT_TX_COUNT=1024
CONFIG_NET_BUF_RX_COUNT=1024
CONFIG_NET_BUF_TX_COUNT=1024

The net_buf count was 1024 when system started:
[17909] TX 978(0x80107b30) RX 1024(0x800f7318) TXBUF 1024 RXBUF 1024

Then with 32 TCP connections sending data to Zephyr, Data buffer (%zd) allocation failed was frequently dumped. After several times of TCP connecting/disconnecting, transmit net_pkt/net_buf count cannot restore to its original value:
[333486] TX 933(0x80104f08) RX 1024(0x800f28d8) TXBUF 831 RXBUF 1024

I think I should first solve the problem of lost net_pkt/net_buf.

@jhedberg
Copy link
Member

@jhedberg we would need spinlock in order to support SMP. Either global for all net_bufs or one per net_buf, which would consume more space than atomic.

@mniestroj ok, in that case we're probably better off with moving to atomic_t. I think we should at the same time consider changing the behaviour of net_buf_ref() so that it returns NULL if you give it a 0 reference count, similar to what we did with bt_conn:

struct bt_conn *bt_conn_ref(struct bt_conn *conn)
{
atomic_val_t old;
/* Reference counter must be checked to avoid incrementing ref from
* zero, then we should return NULL instead.
* Loop on clear-and-set in case someone has modified the reference
* count since the read, and start over again when that happens.
*/
do {
old = atomic_get(&conn->ref);
if (!old) {
return NULL;
}
} while (!atomic_cas(&conn->ref, old, old + 1));
BT_DBG("handle %u ref %u -> %u", conn->handle, old, old + 1);
return conn;
}
void bt_conn_unref(struct bt_conn *conn)
{
atomic_val_t old = atomic_dec(&conn->ref);
BT_DBG("handle %u ref %u -> %u", conn->handle, old,
atomic_get(&conn->ref));
__ASSERT(old > 0, "Conn reference counter is 0");
if (IS_ENABLED(CONFIG_BT_PERIPHERAL) &&
atomic_get(&conn->ref) == 0) {
bt_le_adv_resume();
}
}

We could even consider creating a k_ref abstraction around atomic_t to make refcount implementation simpler (and harder to do wrong). Linux has something like this in the form of struct kref and refcount_t

@mniestroj
Copy link
Member

k_ref abstraction seems a good idea. Recently I needed to add refcounting to drivers/wifi/esp/ to manage sockets lifetime, which could benefit from k_ref as well.

I think we should at the same time consider changing the behaviour of net_buf_ref() so that it returns NULL if you give it a 0 reference count

I am not sure this brings much benefit to net_buf_ref(). If you had a net_buf and somebody already decreased refcount to 0, then this net_buf landed in net_buf_pool. Assuming that it is still in the pool, adding check on 0 refcount in net_buf_ref() makes sense. But it is possible that such net_buf was already allocated once again, so effectively net_buf_ref() caller references new net_buf object (which is placed in the same memory, but still it is something different than desired). This means that we should not call net_buf_ref() on something that might be already dereferenced. In other words, caller is responsible for making sure net_buf_ref() is called only on buffers with refcount > 0.

@nashif nashif added the priority: medium Medium impact/importance bug label Mar 2, 2021
@galak
Copy link
Collaborator

galak commented Apr 13, 2021

Have you tried just make this a uint32_t and testing that. That might convey if its a memory or atomic issue.

@chyu313
Copy link
Contributor Author

chyu313 commented Apr 15, 2021

Have you tried just make this a uint32_t and testing that. That might convey if its a memory or atomic issue.

Changing type of net_buf ref form uint8_t to atomic_t finally did not prevent my system from crashing, so I did not try uint32_t.

The zephyr master branch solves some race condition problems after v2.5 release, so I changed to use master branch code. The system crashed with ARM hard fault as following:

E: ***** USAGE FAULT *****
E:   Illegal use of the EPSR
E: r0/a1:  0x1f9650b4  r1/a2:  0x00000000  r2/a3:  0x00000000
E: r3/a4:  0x00000000 r12/ip:  0xa0000000 r14/lr:  0x80006917
E:  xpsr:  0x00000000
E: r4/v1:  0x8011105c  r5/v2:  0x00000000  r6/v3:  0x00002710
E: r7/v4:  0x80110728  r8/v5:  0x00000500  r9/v6:  0x00000000
E: r10/v7: 0x00000006  r11/v8: 0x80021798    psp:  0x800c8740
E: EXC_RETURN: 0xfffffffd
E: Faulting instruction address (r15/pc): 0x8002164c
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x800233a0 (tcps_25)
E: Halting system

The faulting instruction address locates to a non-text area, it seems to be a stack problem, but I have set all threads' stack size to 8192, also with CONFIG_HW_STACK_PROTECTION=y. It is hard to see what happened before this ARM hard fault. What I am doing now is to use ARM MPU (Memory Protection Unit) to set non-text area to non-executable, so I can know at what point the PC register is changing to an illegal address.

@kevin0402
Copy link

kevin0402 commented Apr 23, 2021

@andrewboie @jukkar @chyu313

Looks like this issue relate to #27032 and might be solved in dde03c6

@jukkar
Copy link
Member

jukkar commented Apr 23, 2021

Looks like this issue relate to #27032 and might be solved in dde03c6

Then that is just a coincidence/side effect as that commit does not protect any access to net_buf.

@chyu313
Copy link
Contributor Author

chyu313 commented Apr 23, 2021

@andrewboie @jukkar @chyu313

Looks like this issue relate to #27032 and might be solved in dde03c6

It seems there are also more protections needed in kernel.

I tested dde03c6 (without modification to protect net_buf) and MPU fault occurred:

E: ***** MPU FAULT *****
E:   Data Access Violation
E:   MMFAR Address: 0x0
E: r0/a1:  0x80026260  r1/a2:  0x00000000  r2/a3:  0x00000000
E: r3/a4:  0x00000000 r12/ip:  0x80020850 r14/lr:  0x8001b2cf
E:  xpsr:  0x210000a8
E: r4/v1:  0x80026260  r5/v2:  0x00000000  r6/v3:  0x00000000
E: r7/v4:  0x00000010  r8/v5:  0x00000010  r9/v6:  0x80020850
E: r10/v7: 0x00000000  r11/v8: 0x00000000    psp:  0x8028df80
E: EXC_RETURN: 0xfffffff1
E: Faulting instruction address (r15/pc): 0x8001b2bc
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Fault during interrupt handling

E: Current thread: 0x80026260 (rx_q[0])
E: Halting system

To avoid NULL pointer access, I use MPU to set ARM address 0x0 (located in ITCM which is not used in my program) as no-access attribute, so that NULL pointer access to 0 will cause MPU FAULT.

The faulting instruction address 0x8001b2bc is located in sys_dlist_remove() which is called by unpend_thread_no_timeout(). The NULL pointer in sys_dlist_remove() is 'prev' which is compiled into r2 (value is 0 when MPU fault).

The comment of sys_dlist_remove() says: "This and other sys_dlist_*() functions are not thread safe."

8001b2b8 <sys_dlist_remove>:
	sys_dnode_t *const next = node->next;
8001b2b8:	e9d0 3200 	ldrd	r3, r2, [r0]
	prev->next = next;
8001b2bc:	6013      	str	r3, [r2, #0]
	next->prev = prev;
8001b2be:	605a      	str	r2, [r3, #4]
	node->next = NULL;
8001b2c0:	2300      	movs	r3, #0
	node->prev = NULL;
8001b2c2:	e9c0 3300 	strd	r3, r3, [r0]
	sys_dnode_init(node);
}
8001b2c6:	4770      	bx	lr

8001b2c8 <unpend_thread_no_timeout>:
{
8001b2c8:	b508      	push	{r3, lr}
	sys_dlist_remove(&thread->base.qnode_dlist);
8001b2ca:	f7ff fff5 	bl	8001b2b8 <sys_dlist_remove>
	thread->base.thread_state &= ~_THREAD_PENDING;
8001b2ce:	7b43      	ldrb	r3, [r0, #13]
8001b2d0:	f023 0302 	bic.w	r3, r3, #2
8001b2d4:	7343      	strb	r3, [r0, #13]
	thread->base.pended_on = NULL;
8001b2d6:	2300      	movs	r3, #0
8001b2d8:	6083      	str	r3, [r0, #8]
}
8001b2da:	bd08      	pop	{r3, pc}

@carlescufi
Copy link
Member

@chyu313 can you confirm this is still an issue in current main?

@chyu313
Copy link
Contributor Author

chyu313 commented May 27, 2021

@chyu313 can you confirm this is still an issue in current main?

Yes, it still an issue. I keep pulling Zephyr code and testing it. Fault still occurs.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jul 27, 2021
@chyu313 chyu313 closed this as completed Aug 8, 2021
@M1cha
Copy link
Contributor

M1cha commented Nov 3, 2023

I created a PR that implements k_ref: #64798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

10 participants