Poor TCP performance #23302

xhpohanka · 2020-03-05T17:50:12Z

I continue playing with Zephyr network stack and STM32 and I unfortunately found next issue. With nucleo_f429zi board and big_http_download sample I got very slow download speed. This pushed me to check network performace with zperf.

For UDP transfers I got around 10Mbps but for TCP the result was only 10kbps which is really bad.

I tried if some older versions of Zephyr behaves better - fortunately v2.0.0 got me also around 10Mbps for TCP in zperf. With bisecting i found that this issue starts with d88f25b.

I hoped that reverting it will solve also the slow big_http_download but surprisingly the download speed is still suspiciously low. I will continue to investigate this tomorrow.

I do not know if these issues are related just with STM32 platform. I have just mentioned nucleo_f429zi and custom board with STM32F750 which has slightly different ethernet peripheral and my driver is written also using HAL. Both behaves in same way.

The issues I met so far with Zephyr networking stack pose a question to me if it is mature enough for production?

The text was updated successfully, but these errors were encountered:

jukkar · 2020-03-05T21:08:45Z

Indeed, there is a regression with the TCP throughput. Problem with TCP has been that currently we have no proper tests for TCP that would catch these kind of regression issues. This has been one of the reasons we have been building a new version of the TCP stack that would support proper testing that could be integrated into sanitychecker. The TCP2 has been cooking quite long time and we are slowly getting there where it would be useful, but we are not yet there.
Obviously the 10kb/sec is not good and this needs to be fixed.

jukkar · 2020-03-06T10:34:02Z

I just compiled samples/net/sockets/dumb_http_server_mt for Atmel sam-e70 board and got following numbers:

ab -n100 http://192.0.2.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.0.2.1 (be patient).....done


Server Software:        
Server Hostname:        192.0.2.1
Server Port:            8080

Document Path:          /
Document Length:        2084 bytes

Concurrency Level:      1
Time taken for tests:   0.381 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      214000 bytes
HTML transferred:       208400 bytes
Requests per second:    262.67 [#/sec] (mean)
Time per request:       3.807 [ms] (mean)
Time per request:       3.807 [ms] (mean, across all concurrent requests)
Transfer rate:          548.95 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     1    3  20.7      1     208
Waiting:        1    3  20.7      1     208
Total:          1    4  20.6      2     208

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      2
  98%      3
  99%    208
 100%    208 (longest request)

So definitely more than 10kb/sec. Then using wrk

./wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://192.0.2.1:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.74ms   62.79ms 616.12ms   97.41%
    Req/Sec     1.74      3.83    20.00     89.80%
  Latency Distribution
     50%    1.29ms
     75%    1.35ms
     90%    4.46ms
     99%  207.79ms
  116 requests in 20.10s, 242.42KB read
  Socket errors: connect 0, read 134, write 0, timeout 0
Requests/sec:      5.77
Transfer/sec:     12.06KB

Now the transfer speed is very poor.

I then increased the CONFIG_NET_TCP_BACKLOG_SIZE=2 and got considerably better results:

./wrk -d 10 -t 2 -c 100 --latency http://192.0.2.1:8080
Running 10s test @ http://192.0.2.1:8080
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.61ms  421.92us   9.90ms   95.03%
    Req/Sec   375.32    153.56   610.00     72.80%
  Latency Distribution
     50%    1.41ms
     75%    1.88ms
     90%    1.98ms
     99%    2.17ms
  4768 requests in 10.01s, 9.73MB read
  Socket errors: connect 0, read 4774, write 0, timeout 0
Requests/sec:    476.10
Transfer/sec:      0.97MB

xhpohanka · 2020-03-06T13:09:47Z

I have this behavior for STM32F429ZI

CONFIG_NET_TCP_BACKLOG_SIZE=1

$ wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://192.0.2.1:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.51ms    1.79ms   8.77ms   89.86%
    Req/Sec     1.86      3.43    10.00     86.96%
  Latency Distribution
     50%    1.93ms
     75%    1.95ms
     90%    6.70ms
     99%    8.77ms
  69 requests in 20.10s, 144.20KB read
  Socket errors: connect 0, read 69, write 0, timeout 0
Requests/sec:      3.43
Transfer/sec:      7.17KB

CONFIG_NET_TCP_BACKLOG_SIZE=2

$ wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://192.0.2.1:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   196.18ms   36.93ms 407.71ms   96.32%
    Req/Sec     8.92      4.18    35.00     85.63%
  Latency Distribution
     50%  202.17ms
     75%  202.23ms
     90%  202.32ms
     99%  209.31ms
  896 requests in 20.09s, 1.84MB read
  Socket errors: connect 0, read 898, write 0, timeout 0
Requests/sec:     44.60
Transfer/sec:     93.92KB

I do not understand why latency is so slow.

The previous result was for v2.2.0-rc3, for 2.1.0-rc3 it is even worse...
CONFIG_NET_TCP_BACKLOG_SIZE=2

$ wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://10.42.0.192:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   168.84ms   74.40ms 220.12ms   82.07%
    Req/Sec     5.78     10.02   100.00     98.40%
  Latency Distribution
     50%  202.34ms
     75%  202.38ms
     90%  202.58ms
     99%  218.44ms
  145 requests in 20.03s, 305.86KB read
  Socket errors: connect 0, read 789, write 0, timeout 0
Requests/sec:      7.24
Transfer/sec:     15.27KB

xhpohanka · 2020-03-06T13:22:14Z

It also seems to me that issue with big_http_download sample is a bit different. The download is very slow for me with no respect to CONFIG_NET_TCP_BACKLOG_SIZE or reverting d88f25b commit. Downloading of 52kB sized file lasts 8.5 seconds. You can check my wireshark log https://drive.google.com/open?id=163-v3MlK3Hgc4F47X05GSSJRo3EFpX-s, it is full of [TCP Window Full] packets.

I also checked zperf again on v2.2.0-rc3 and this is what I get with upstream code

$ iperf -c 192.0.2.1
------------------------------------------------------------
Client connecting to 192.0.2.1, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 192.0.2.2 port 51802 connected with 192.0.2.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.2 sec  13.6 KBytes  10.9 Kbits/sec

and this with d88f25b reverted, backlog size has no impact here

$ iperf -c 192.0.2.1
------------------------------------------------------------
Client connecting to 192.0.2.1, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 192.0.2.2 port 51778 connected with 192.0.2.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec  16.1 MBytes  13.4 Mbits/sec

I'm very interested in getting this to work better. I can do more tests and even development if I get some guidance...

jukkar · 2020-03-08T16:36:09Z

Some fixes for the memory leaks in #23334
The #23246 is related to this issue. The low performance numbers are probably culprit of lot of packets being dropped. Anyway, the memory leaks needs to be fixes first, and then we can see how the performance number behave.

pfalcon · 2020-03-10T16:32:34Z

For UDP transfers I got around 10Mbps but for TCP the result was only 10kbps which is really bad.

Indeed, there is a regression with the TCP throughput.

I personally never saw other TCP speed figures from Zephyr with frdm_k64f/any of qemu networking drivers. (Well, perhaps I saw 15KBytes/s, but you get the point.) I actually wanted to add dumping of download speed to big_http_download, but decided against that ;-).

pfalcon · 2020-04-02T11:21:09Z

I'm currently working on switching over big_http_download to a local host downloads (yep, to put it into CI and not rely on external hosts which would provide additional disturbance to testing).

And I decided to share what I see with just an asciinema recording: https://asciinema.org/a/BXbcuYTQsrsPxDmdCnbfUz7Qz (as I'm not sure how soon they garbage-collect uploads, also attaching here
big_http_download-localhost.zip ).

So, as you can see, it starts slow. Then at ~70Kb (~00:19 on the cast) it finally start to work "as it should", than at ~140KB (~00:21) it breaks again. You've already read my explanation of what happens (based on hours of peering into writeshark output (but months ago)): it starts mis-synchronized with Linux TCP sending rexmits and bumps to a large backoff delay, then manages to synchronize to Linux (this is literally the first time I see it), then loses the sync again and goes back to swamp for the rest of download.

pfalcon · 2020-04-02T11:24:09Z

(Which makes me think maybe add a calculation of "momentary" speed over e.g. last second or two, and print out each on a separate line, not as with "\r").

jukkar · 2020-05-25T14:21:40Z

I tested this with nucleo_f767zi and see wildly different numbers which depend on the tool used. ApacheBench gave around 470kb/sec results which is quite reasonable. The wrk tool on the other hand pushes so much stuff to zephyr that it easily runs out of memory, and starts to drop packets. This then affects the performance number a lot. I am lowering the priority of this one until we figure out what are the correct numbers.

github-actions · 2020-07-25T00:44:03Z

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

pfalcon · 2020-09-03T10:29:32Z

Recently, Zephyr switched to newer, "TCP2" implementation by default. I'm doing its triaging for the 2.4 release (#27876, #27982), ans also would like to record some quick performance data:

dumb_http_server sample, qemu_x86, standard SLIP-based QEMU networking setup, ab -n1000 http://192.0.2.1:8080/: 60.78 req/s.

For reference, with TCP1 (CONFIG_NET_TCP1=y), it's 4.18 req/s.

So, there's a noticeable improvement in connection latency.

pfalcon · 2020-09-03T11:36:26Z

But not so bright with data transfer throughput. The sample is big_http_download, the rest is as above.

First, reference with CONFIG_NET_TCP1=y: 3108 b/s . That's pretty slow. I remember speeds around 10KB/s. Dunno what's up now, my inet connection is not ideal either.

So, with default TCP2, it's 6154 b/s (mind #27982, the transfer doesn't complete successfully, implied end time was used to calc speed).

pfalcon · 2020-09-03T12:02:50Z

For reference, trying @nzmichaelh's hack from #26330 (comment) (set NET_TCP_BUF_MAX_LEN to (4*1280)) didn't make a statistical difference to me (on qemu_x86 with SLIP, again).

hakehuang · 2022-03-15T02:35:25Z

@rlubos so do we have a golden platform for zperf?

rlubos · 2022-03-15T09:29:44Z

@rlubos so do we have a golden platform for zperf?

I'm not aware if we have any "golden" platform for tests, I think we should aim to get a decent performance on any (non-emulated) ethernet board. I've ordered a few on my own to perform some measurements.

hakehuang · 2022-03-16T03:12:44Z

I'm not aware if we have any "golden" platform for tests, I think we should aim to get a decent performance on any (non-emulated) ethernet board. I've ordered a few on my own to perform some measurements.

without a golden platform, we will have to take account of driver impacts, which may not be a good thing.

pfalcon · 2022-03-16T08:51:28Z

without a golden platform, we will have to take account of driver impacts, which may not be a good thing.

At Linaro, we standardized on frdm_k64f as the default networking platform from Zephyr's start. My records of Zephyr's networking testing against it (and qemu which is everyone's favorite platform) is at https://docs.google.com/spreadsheets/d/1_8CsACPEXqrMIbxBKxPAds091tNAwnwdWkMKr3994QY/edit#gid=0 (it's pretty spotty, as it's my personal initiative to maintain such as a spreadsheet, so it was done on the best effort basis). When I have a chance, I'll test current Zephyr using the testcases in the spreadsheet (I'm working on other projects now).

mdkf · 2022-03-21T21:36:19Z

Why is there a sleep within a send loop (https://github.com/mdkf/ZephyrTCPSlow/blob/main/src/socketTest.c#L159)? 2 ms sleep betwen each datagram sent doesn't sound like a best idea for a througput measuring test.

I inserted the sleep when testing the F746ZG. It died otherwise. It was not included in the other measurements.

Additionally, 90 byte payload size in datagram seems pretty small fro throughput measurement, did you send such a small packets with mbed as well?

I used the same test in Mbed and FreeRTOS. I also tried with a 1400byte packet. It did not improve the performance in Zephyr. After the first 90 byte packet was sent, Wireshark shows the subsequent packets sent were closer to 1480 bytes. Basically the 90 byte packets accumulated and were sent together.

AndreyDodonov-EH · 2022-04-19T19:34:05Z

@hakehuang What test code did you use on the Zephyr side, zperf? Note, there's currently a bug, fixed in #43379, w/o it you won't get decent results (the recv window gets filled and the communication stalls).

Also, please note that increasing the window size means that you also need to increase the RX pkt/buf count in the system, otherwise the effort is futile (net driver will start dopping packets, enforcing retransmissions from the server which has a great negative impact on the throughput). Increasing TX pkt/buf count a bit also makes sense, as we need to acknowledge each TCP packet received (I wonder if there's room for improvement here, in theory it should be enough to acknowledge once with larger ACK value).

@rlubos
Great that you mentioned ACK issue.
Yes, there is room for improvement in terms of ACK-ing multiple packages with a single ACK, there is even a (somewhat stale) issue for that: #30366

AndreyDodonov-EH · 2022-04-22T12:34:31Z

@jukkar
Probably wrong thread to ask, but is there a reason behind magic constant 3 ?
https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/ip/tcp.c#L1801

Because I observed that max window size is quite critical.
Actually not to get Data buffer allocation errors, I had to change that to 4, or define custom CONFIG_NET_TCP_MAX_SEND_WINDOW_SIZE

Sorry if I'm missing something here

jukkar · 2022-04-26T10:41:54Z

Probably wrong thread to ask, but is there a reason behind magic constant 3 ?

No specific reason, just a somewhat reasonable value when the code was written. If you find value 4 more suitable, please send a PR that changes it.

AndreyDodonov-EH · 2022-04-27T07:49:35Z

Probably wrong thread to ask, but is there a reason behind magic constant 3 ?

No specific reason, just a somewhat reasonable value when the code was written. If you find value 4 more suitable, please send a PR that changes it.

I don't think it makes sense to open PR with another magic constant, at the very least with KConfig flag.

It worked for me, yes, but I'd like to understand the meaning behind it. Ideally this coefficient should be calculated.

ssharks · 2022-05-03T06:55:11Z

With the latest patches in, in the field I get the cloud application dropping larger transfers as the transfer rate drops too low. This happens when there is less then 240 bytes transferred in the last 5 seconds.

Looking at the test results I can see something interesting happening.

Based on the qemu_cortex_a9 target (not different for qemu x86)

Transferring 60 kByte

With preemptive scheduling:

Without packet loss:
===================================================================
START - test_v4_send_recv_large
 PASS - test_v4_send_recv_large in 19.84 seconds

With packet loss:

===================================================================
START - test_v4_send_recv_large
 PASS - test_v4_send_recv_large in 25.102 seconds
===================================================================

With cooperative scheduling

Without packet loss:

===================================================================
START - test_v4_send_recv_large
 PASS - test_v4_send_recv_large in 10.751 seconds

With packet loss:

===================================================================
START - test_v4_send_recv_large
 PASS - test_v4_send_recv_large in 22.12 seconds
===================================================================

In the case of no packet loss, I would expect the elapsed time to be less then a second, a no timeouts need to occur. It is also interesting to see the cooperative scheduling being almost 2 times faster. In case of packet loss, timeouts could need to occur for re-transmission, than it could take longer. Nevertheless is little difference in runtime between the case with and without packet loss.

@rlubos Mentioned that increasing the buffers

CONFIG_NET_BUF_RX_COUNT=64
CONFIG_NET_BUF_TX_COUNT=64

And putting a k_yield after the send call helps significantly to accelerate the test. But nevertheless, in a zero delay, no packet loss testcase, it should also be possible to get a fast throughput with just smaller number of buffers.

ssharks · 2022-05-05T06:16:46Z

I attempted to dive a little bit deeper in issue: #45367

rlubos · 2022-05-31T13:29:27Z

I've finally managed to do some throughput tests on actual hardware. I had mimxrt1020_evk and nucleo_h723zg on the table.

TL;DR The results for nucleo_h723zg are good, but for mimxrt1020_evk they're rather poor.

For testing, I've used iperf on the Linux host side and zperf sample on the Zephyr side. Let's focus on nucleo_h723zg first. As a reference, I've used UDP throughput as in this case we avoid protocol specific constraints (like TX/RX window size with TCP).
Initial results for zperf running in default configuration are not bad but not great either. When analyzing eth_stm32_hal.c driver I've noticed though, that on the TX path, the driver blocks during the transmission, effectively negating any positive performance effects of using DMA. In result, the total transmission time of a single frame consists not only of the time needed to actually transmit the frame, but also the time needed to process UDP/IP is added to the overall transmit time. As in the default configuration Zephyr does the L4/L3/L2 and the driver processing in a single thread, all of the processing times adds up, affecting the final throughput.

I've managed to increase the throughput by enabling the TX queue (CONFIG_NET_TC_TX_COUNT=1). As a result, a packet, instead of being passed to L2 directy, is queued, and the actual L2 and driver processing is done in a separate thread. This allows for increased throughput, because when the driver blocks during the transmission, the other thread which does the L4/L3 processing is able to proceed with the next frame. I think this should be a default configuration in the zperf sample.

Another small throughput improvement can be achieved by setting the net buffer size to the actual network MTU (CONFIG_NET_BUF_DATA_SIZE=1500). In this case, L3/L4 processing takes less time, as the packet consists of a single buffer, instead of a chain of buffers that net stack needs to process. This also increases the default TCP TX/RX window size, which improves the TCP throughput, both ways.

Finally, TCP throughput can be further increased by maximizing the TCP window sizes. I've achieved that by increasing the net_pkt/net_buf count and relying on the default window size set by Zephyr.

The overall results are presented in the table below (the measurements were taken on the receiving node, i.e. iperf for upload, zperf for download):

Configuration	TCP RX/TX window	UDP upload	TCP upload	UDP download	TCP download
Default	1194	51.2 Mbits/sec	670 Kbits/sec	88.12 Mbits/sec	7.71 Mbits/sec
CONFIG_NET_TC_TX_COUNT=1	1194	73.6 Mbits/sec	670 Kbits/sec	88.03 Mbits/sec	7.73 Mbits/sec
CONFIG_NET_TC_TX_COUNT=1 CONFIG_NET_BUF_DATA_SIZE=1500	14000	78.3 Mbits/sec	69.8 Mbits/sec	88.01 Mbits/sec	75.03 Mbits/sec
CONFIG_NET_TC_TX_COUNT=1 CONFIG_NET_BUF_DATA_SIZE=1500 CONFIG_NET_PKT_RX/TX_COUNT=80 CONFIG_NET_BUF_RX/TX_COUNT=80	40000	77.9 Mbits/sec	75.0 Mbits/sec	88.12 Mbits/sec	79.56 Mbits/sec

A side note, I was able to improve the UDP TX throughput even further by modifying the eth_stm32_hal.c, to block not after sending the packet to the HAL, but before (i. e. to block if the previous transfer hasn't finished yet). This allowed to reach ~87 Mbits/sec, however I'm not confident enough to push those changes upstream, as there are other aspects to consider (for instance PTP is processed after the packet is transmitted, I'm not sure if that change wouldn't break that). I'll leave it to the driver maintainers to decide whether to improve or not.

Now when it comes to mimxrt1020_evk, the results are presented below:

Configuration	TCP RX/TX window	UDP upload	TCP upload	UDP download	TCP download
CONFIG_NET_TC_TX_COUNT=1 CONFIG_NET_BUF_DATA_SIZE=1500 CONFIG_NET_PKT_RX/TX_COUNT=80 CONFIG_NET_BUF_RX/TX_COUNT=80	40000	17.2 Mbits/sec	7.88 Mbits/sec	16.23 Mbits/sec	456 Kbits/sec

I've investigated this platform a bit, and the conclusion for the poor performance is as follows:

(minor) The eth_mcux.c driver does the same thing as eth_stm32_hal.c, i. e. it blocks during transfer. In this case however there is an additional thread within the driver involved to unblock, which adds extra overhead due to scheduling.
(major) When measuring the time needed to process individual frames I've noticed that this platform is much slower than nucleo_h723zg (it took ~4 times longer to do the L4/L3 processing). This is a bit surprising to me, as both platform appear to be running on Cortex M7, with similar CPU speed (500 MHz vs 550 MHz). @dleach02 Do you know perhaps what could be the reason of this?
(major) When downloading at full speed, the driver reports lots of errors (<err> eth_mcux: ENET_GetRxFrameSize return: 4001). I don't know the reason, but it could be a side effect of the above point.

To summarize, I think that the results achieved on nucleo_h723zg prove that it is possible to achieve competitive throughputs with Zephyr with proper configuration and a well-written ethernet drvier. Ideally it'd be good to test other platforms as well, but due to limited availability of development kits in general I couldn't get some obvious choices like the super popular frdm_k64f. I therefore suggest to close this general issue, as it might be misleading, given the above results and open a board/driver specific issues instead.

rlubos · 2022-05-31T13:32:59Z

As for the zperf, the sample uses net_context API directly, which gave me a bit of a headache due to some issues with TCP handing in the sample (the TCP context was freed too early as the sample did not add extra ref to the net_contex, also it does not take EAGAIN/ENOBUF returned by the TCP layer into consideration). I'm thinking however, that instead of fixing those issue it'd be worthwhile to rewrite the sample to use socket API instead, which is a more realistic scenario for actual apps. I plan to work on this in a near future.

ssharks · 2022-05-31T15:22:36Z

Very interesting results. This clearly shows that in a happy flow situation the performance can be pretty decent. You are using a point to point wired link I assume.
The polling implementation has definitely helps to improve throughput quite a bit.

You increased the window by increasing the CONFIG_NET_BUF_DATA_SIZE to 1500 bytes over the default 128, do you know if this has the same affect as increasing the CONFIG_NET_BUF_RX/TX_COUNT by a factor 12? Apart from maybe some processing overhead I would expect it to have the same affect. Only small packets will consume considerably less space.

On wireless network (cellular or WiFi) to introduce some packet loss, with big latency (to the other side of the world) things will start to look quite different. First of all there is no collision avoidance, so the fairness to other network traffic is pretty bad. Secondly if there is one packet lost along the way, the stack will start re-transmitting the complete transmit buffer. A triple duplicate-ack triggered fast-retransmit will help here.

rlubos · 2022-06-01T08:33:49Z

You are using a point to point wired link I assume.

Yes, the whole point of this experiment was to compare how the actual throughput compares to the theoretical maximum throughput over 100 Mbit Ethernet, and it seems we're pretty close to the limit.

You increased the window by increasing the CONFIG_NET_BUF_DATA_SIZE to 1500 bytes over the default 128, do you know if this has the same affect as increasing the CONFIG_NET_BUF_RX/TX_COUNT by a factor 12? Apart from maybe some processing overhead I would expect it to have the same affect. Only small packets will consume considerably less space.

Yes, the default window size is calculated based on the buffer size and buffer count, i.e. the overall size of all of the buffers, so you could reach the same effect by increasing the buffer count. The sole reason to increase the buffer size here was to reduce the processing time of an individual frame, I would say however that this is only recommended if you really need to maximize your througputs, usuallly it's better to increase buffer count, as you don't waste space on small packets.

On wireless network (cellular or WiFi) to introduce some packet loss, with big latency (to the other side of the world) things will start to look quite different. First of all there is no collision avoidance, so the fairness to other network traffic is pretty bad. Secondly if there is one packet lost along the way, the stack will start re-transmitting the complete transmit buffer. A triple duplicate-ack triggered fast-retransmit will help here.

Well yes, it is expected that the throughputs will be worse in case of lossy networks. If there are mechanisms specified in TCP, that could help to improve performance in such case, we should consider implementing them. I think though that those should be conidered as enhancements, not reported as "bugs" like this issue is.

carlescufi · 2022-06-02T11:09:06Z

On wireless network (cellular or WiFi) to introduce some packet loss, with big latency (to the other side of the world) things will start to look quite different. First of all there is no collision avoidance, so the fairness to other network traffic is pretty bad. Secondly if there is one packet lost along the way, the stack will start re-transmitting the complete transmit buffer. A triple duplicate-ack triggered fast-retransmit will help here.

@rlubos and @ssharks can we create an enhancement issue for this?

ssharks · 2022-06-23T20:49:46Z

@rlubos: Could you redo the upload tests with small window of #23302 (comment), with the fix of #46584 in? The figures will look very different I believe.

@xhpohanka: PR #46584 was recently merged and I think it solves the issue you described. Are you in a position to check if your problem has been fixed. If so, this issue can be closed. In fact, the issue #45844, looks pretty similar to your description.

xhpohanka · 2022-06-24T06:14:03Z

Hello @ssharks,
I have not done zperf testing for a long time, but I have checked the recent updates to the TCP stack including #46584. In our application the performace really improved a lot. From my POV this issue can be closed :)

rlubos · 2022-06-24T11:30:16Z

@ssharks Hmm, but the Silly Window shouldn't affect the upload as it's related to the RX window size? Did you mean download?

Anyways, I've ran the test again, no difference on the upload side, the download throughput is slightly improved (in low window scenario) to 8.68 Mbps. When I tested the solution, the most significant performance boost happended in case we reported Zero window to peer, as this didn't take place anymore with #46584. This didn't happend though in the initial test I performed here.

rlubos · 2022-06-24T11:32:52Z

Hello @ssharks,
I have not done zperf testing for a long time, but I have checked the recent updates to the TCP stack including #46584. In our application the performace really improved a lot. From my POV this issue can be closed :)

I suggest we thereby close this long-open issue.

xhpohanka added the bug The issue is a bug, or the PR is fixing a bug label Mar 5, 2020

carlescufi assigned jukkar, tbursztyka and rlubos Mar 5, 2020

carlescufi added the area: Networking label Mar 5, 2020

tbursztyka removed their assignment Mar 6, 2020

nashif added the priority: medium Medium impact/importance bug label Mar 6, 2020

This was referenced Mar 11, 2020

TCP memory leak fixes #23334

Merged

net: TCP retransmit queue implementation is broken #5857

Closed

jukkar mentioned this issue Apr 28, 2020

Low throughput with the zperf sample using stm32f746g_disco #24770

Closed

jukkar added priority: low Low impact/importance bug and removed priority: medium Medium impact/importance bug labels May 25, 2020

nzmichaelh mentioned this issue Jun 21, 2020

tcp: low bulk receive performance due to window handling #26330

Closed

github-actions bot added the Stale label Jul 25, 2020

github-actions bot closed this as completed Aug 8, 2020

dleach02 reopened this Sep 2, 2020

AndreyDodonov-EH referenced this issue in endresshauser-lp/sdk-zephyr Apr 22, 2022

Revert TCP window workaround

60513ee

AndreyDodonov-EH mentioned this issue Apr 27, 2022

net: tcp: Implement receive window handling #43018

Merged

This was referenced Jun 7, 2022

net: tcp: Implement fast-retransmit #46351

Closed

net: tcp: Implement a congestion avoidance algorithm #46352

Closed

Not all bytes are downloaded with HTTP request #45844

Closed

rlubos closed this as completed Jun 24, 2022

carlescufi mentioned this issue Jul 11, 2022

Poor Ethernet Performance using NXP Enet MCUX Driver #47641

Closed

lucasdietrich mentioned this issue Jul 15, 2022

Performance issue with nucleo_f429zi since Zephyr 3.1.0 lucasdietrich/zephyr-caniot-controller#5

Closed

rlubos mentioned this issue Mar 22, 2023

IPv6 defragmenting fails when segments do not overlap #54577

Closed

rlubos mentioned this issue Jul 8, 2024

samples: net: zperf: Optimize configuration for better performance #75281

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor TCP performance #23302

Poor TCP performance #23302

xhpohanka commented Mar 5, 2020

jukkar commented Mar 5, 2020

jukkar commented Mar 6, 2020 •

edited

Loading

xhpohanka commented Mar 6, 2020

xhpohanka commented Mar 6, 2020 •

edited

Loading

jukkar commented Mar 8, 2020

pfalcon commented Mar 10, 2020

pfalcon commented Apr 2, 2020

pfalcon commented Apr 2, 2020

jukkar commented May 25, 2020

github-actions bot commented Jul 25, 2020

pfalcon commented Sep 3, 2020

pfalcon commented Sep 3, 2020

pfalcon commented Sep 3, 2020

hakehuang commented Mar 15, 2022

rlubos commented Mar 15, 2022

hakehuang commented Mar 16, 2022

pfalcon commented Mar 16, 2022

mdkf commented Mar 21, 2022 •

edited

Loading

AndreyDodonov-EH commented Apr 19, 2022

AndreyDodonov-EH commented Apr 22, 2022

jukkar commented Apr 26, 2022

AndreyDodonov-EH commented Apr 27, 2022

ssharks commented May 3, 2022 •

edited

Loading

ssharks commented May 5, 2022

rlubos commented May 31, 2022 •

edited by carlescufi

Loading

rlubos commented May 31, 2022

ssharks commented May 31, 2022

rlubos commented Jun 1, 2022

carlescufi commented Jun 2, 2022

ssharks commented Jun 23, 2022 •

edited

Loading

xhpohanka commented Jun 24, 2022

rlubos commented Jun 24, 2022

rlubos commented Jun 24, 2022

Poor TCP performance #23302

Poor TCP performance #23302

Comments

xhpohanka commented Mar 5, 2020

jukkar commented Mar 5, 2020

jukkar commented Mar 6, 2020 • edited Loading

xhpohanka commented Mar 6, 2020

xhpohanka commented Mar 6, 2020 • edited Loading

jukkar commented Mar 8, 2020

pfalcon commented Mar 10, 2020

pfalcon commented Apr 2, 2020

pfalcon commented Apr 2, 2020

jukkar commented May 25, 2020

github-actions bot commented Jul 25, 2020

pfalcon commented Sep 3, 2020

pfalcon commented Sep 3, 2020

pfalcon commented Sep 3, 2020

hakehuang commented Mar 15, 2022

rlubos commented Mar 15, 2022

hakehuang commented Mar 16, 2022

pfalcon commented Mar 16, 2022

mdkf commented Mar 21, 2022 • edited Loading

AndreyDodonov-EH commented Apr 19, 2022

AndreyDodonov-EH commented Apr 22, 2022

jukkar commented Apr 26, 2022

AndreyDodonov-EH commented Apr 27, 2022

ssharks commented May 3, 2022 • edited Loading

ssharks commented May 5, 2022

rlubos commented May 31, 2022 • edited by carlescufi Loading

rlubos commented May 31, 2022

ssharks commented May 31, 2022

rlubos commented Jun 1, 2022

carlescufi commented Jun 2, 2022

ssharks commented Jun 23, 2022 • edited Loading

xhpohanka commented Jun 24, 2022

rlubos commented Jun 24, 2022

rlubos commented Jun 24, 2022

jukkar commented Mar 6, 2020 •

edited

Loading

xhpohanka commented Mar 6, 2020 •

edited

Loading

mdkf commented Mar 21, 2022 •

edited

Loading

ssharks commented May 3, 2022 •

edited

Loading

rlubos commented May 31, 2022 •

edited by carlescufi

Loading

ssharks commented Jun 23, 2022 •

edited

Loading