-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: WiFi / Network connectivity issues with 2.4.13+ #5458
Comments
Are you connected to the public mqtt server? Do you have device logs? |
No device logs. Using a public MQTT server (used by tens of others), but not "the" public MQTT server. MQTT server resides on my local network (connecting via private address).
This seems extremely relevant. Only using pre-built/published builds, so I'll probably wait for a release to test. I have a static IP set as of 8pm on the problematic node and will report back the results. Based on #5387 I would assume that it's still going to occur with a static. Rather than try to set something up to collect logs, I'll probably wait out the next alpha to see if it solves. |
I am assuming the same re: MQTT::onReceive. I'm using public dns and IP and relying on a nat mirroring rule to route it back inside, so the RFC1918 issue shouldn't impact, but still great info to know. |
Potentially also relevant: we were power saving even when wifi was connected. That's fixed now: #5443 |
Saw that but it's not related. Static IP has kept it from falling off completely overnight, but it's disconnecting+reconnecting to wifi over and over and seems a little wonky overall (pretty sure it's not beaconing device metrics to MQTT consistently). 2 minutes of no ping responses, responds for 5-7 seconds, then back to no response. I'll wait for a release that incorporates #5387 before I attempt to troubleshoot further. |
I'm also experiencing the issue on 2.5.14.f2ee0df and latest beta. Waiting for the mentioned fixes. |
I've build the files from the master repo and can confirm that the issue seems fixed, my t-beam is now online >48 without any issues |
Hmm, the same issue sometimes occurs while connecting to the node via web interface and TCP. The device rebooted itself after few minutes. No logs for now unfortunately, due to remote location. I have ~70 nodes in my NodeDB, if that matters. Edit: looks like all is fine if I connect to the node shortly after the reboot. |
Meshtastic Firmware 2.5.15.79da236 Alpha was released with fixes for #5387. |
Fixed by #5387 |
Had to reopen, Firmware 2.5.15.79da236 does not fix this issue. My 2 Supremes are disconnectig after a couple of hours. |
Same problem here with an Heltec v3... |
Also seeing the same behavior with my Station G2s on 2.5.15. I will try to pull logs later today. |
Thanks for the reply. I'm seeing this on two fairly complex networks w/Ubiquiti as well as a very simple cable modem + netgear nighthawk consumer router setup. Reportedly the issue does not occur if MQTT is disabled - do you have your node connected via mqtt? I'm currently timing the issue, re-checking the behavior post-patch, and capturing logs. After that I'll try turning MQTT off and see if it makes any difference. It would be nice to narrow it down. I highly doubt it's actually something to do with the code around networking. |
Yes, It is connected via MQTT to my local MQTT server, not public MQTT. It is using a DNS name which will internally resolve to a local IP. |
It has nothing to do with unifi ore something. No special setup here, one node is connected to my fritzbox and the other is connected to my workplace with extreme enterprise ap. Long fast ist uplink only and 3 channels with moderate traffic are setup with up/downlink. If i disable mqtt the node will stay online. |
If your mqtt is on the default topic that is likely too much traffic for the node to handle. |
Mine is connecting to a local mosquitto instance (not bridged) and behaves the same way. |
Similar. One node is connecting to self-hosted mosquitto on LAN, the other is connecting to the same through the internet (NAT). Decently low traffic overall (way less than the official/dev server) |
The attached log from mosquitto shows how unstable the connection is. Then I've triggered a reboot via LoRa and it's all gone. I think this looks a bit weird:
I have a Wemos D1 Mini placed in the exact same location, with ESPHome firmware and it's super stable despite having a ~10 dBm weaker WiFi signal. |
Is the mqtt JSON functionality being used? Who is hosting the private brokers? |
Not in any of my use cases. I'm hosting the mqtt server locally (ubuntu vm, mosquitto). Seeing it w/node on LAN as well as one that is connecting through NAT/internet. |
Currently I'm testing the connection through WiFi repeater and with different network settings on both ends - maybe it doesn't like my access point and/or vice versa. I'll let you know about the results in next couple of days. (testing also #5490 case) |
I just wanted to provide an update as it's been a couple of days. During my initial tests, both nodes died at 36 hours. Currently, both have been up for 65 hours. The only thing that is different is that I'm using PRTG to ping and scrape json data (http) every 60 seconds. I'm going to turn off the ping/json traffic, reboot both nodes, and see if I can reproduce the original issue. I'll circle back here in ~48hr. |
I'm seeing this since updating fw from 2.3.10 to 2.5.11 on my 4 TLORA_V2_1_1P6 devices. Can't reliably connect by from phone via WiFi to fixed IP of nodes. Sometimes switching between nodes and back connects, but often not. Sometimes switching between WiFi SSIDs (all same network) fixes, but often not. |
Thank you all for your patience while I attempted to reproduce this. The original test nodes have been up for 4 days solid and the remote node (on a completely different "consumer" network setup) has been up and going for a little over 2 days now. Subjectively, I feel that this issue can be closed out as resolved. I was able to reproduce this easily on 2.4.13/2.4.14. With 2.4.15 I had an MQTT/network drop-off initially (with two nodes simultaneously), but I haven't been able to reproduce the issue since. T1000-E stopped randomly rebooting itself with the fixes in 2.5.15 also. It would be good to note on the github releases page that this mqtt/network issue exists in 2.3.10 - 2.3.14 (apparently). |
|
Upgrade to 2.5.15. |
Really?? Whe are on .15. This problem is not solved. Please reopen, this really sucks. |
The issue still exists. I've managed to make it more stable by adding a esp_wifi_repeater but it's a workaround. Now my theory is that something is clogging (buffer? memory leak?) when TCP segments have to be retransmitted, because the same thing occurs while using the Web client. |
Same as here -> #5549 |
What is really bad, you have a couple of people in this issue with this problem. On person say "it is fixed for me" and zapp -> closed. |
The original reporter of this issue said that their issue was fixed for them by upgrading. You have inserted yourself into this issue without any background context to whether or not you are even experiencing the same scenario, on the same hardware, or any reproduction steps. I think you should rework your approach here to be less combative and demanding. |
Just as a final update, both of the original nodes that had issues have been up for almost 8 days solid now. I've started upgrading remote nodes (~100mi away) with 2.5.15/.16 and have not had any issues with them either. Mixture of ESP32 and NRF hardware. I'm also tracking heap free on one of the two original problematic nodes and haven't seen anything that screams memory leak either. Thanks for the assistance/patience on this one. |
I also see this on a Station G2. I have the power supply set to cycle every x hours as a workaround. |
Category
WiFi
Hardware
T-Beam, Heltec V3, Station G2
Firmware Version
2.5.13
Description
After a period of time (6-18hrs, no longer than 24hr), WiFi enabled ESP32 based devices on are losing network connectivity (not physical connectivity, just no longer passing traffic). I have tested this with 2.5.13 and 2.5.14, but not older 2.5.x builds. I'm going to setup a heltec v3 for testing so I can capture console traffic. I'll post that here when I'm able to do so.
When the issue occurs:
I am seeing this on:
Relevant log output
No response
The text was updated successfully, but these errors were encountered: