-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messages often come through garbled as "�떗" since recent update #3452
Comments
is LongFast connected to MQTT? the debug log shows the payload in bytes, so this is how it came in from the radio. can you get the serial logs from the Heltec as this comes through? |
I don't have any public MQTT enabled. I'm afraid my node is sealed up on a mast above the building currently, I won't have physical access to it again for another week or so, but I'll link other users who have been experiencing this to this issue, hopefully one of them can collect the serial data when one of these messages comes through. Thanks! |
Can confirm this issue is affecting myself also - serial debug log from the radio below. The message field appears as ???? This issue seems to affect messages at random, and 5 messages from the same endpoint will show 1-2 corrupted.
|
I noticed after flashing 2.3.0, with the Heltec V3 still connected, the terminal halted with some words corrupted with what I think were those or similar characters. It did this consistently and in the same way and at the same point when flashing 2.3.0 each time. Flashing 2.2.24 did not do this and the terminal would continue to display updating text after rebooting. Perhaps this is related. I cannot test for the exact symbols seen right now unfortunately. Edit: the exact text is below. Possibly not related but mentioned in case useful and triggers something.
|
I realised I could pull my mobile node out to capture this for you @andrekir , I hope this snapshot is enough, you can see the garbled message received and also retransmitted:
|
I'll just add my 2p here, I don't think it's Android that is the problem.... it's a firmware issue - and we brought it up in the firmware discord yesterday. I'd also add that perhaps the hopStart data should be added to the debugging output.... |
ive had the same results as above. Down grading my firmware to 2.2.24 but keeping the app at 2.3 prevents me from seeing any garbled messages as well as receiving more periodic telemetry than before. Before i would go hours or days between updates of telemetry from remote nodes while on 2.3 firmware. |
@Obinice Do you know if the message originated from the Android app as well? So far I couldn't reproduce the issue sending between 2.2.24 and 2.3 devices, as well as from 2.2.24 --> 2.3.0 --> 2.2.24 or 2.2.24 --> 2.3.0 --> 2.3.0. This is when sending from either iOS or CLI (currently don't have an Android device with me). I've tried different kind of messages like DMs, traceroute and position request.
It will be shown, but only if all nodes in the path are on 2.3.0. |
But what if a node in the path has a different firmware version? What happens in these cases? Could it be that a repeating node is introducing some garbly garbage, chopping off some bits, or some other quirk? |
Not on line 271 though, which is clearly where the vast majority of debugging output seen above is originating. |
Over LoRa, three previously unused bits in the header are now used for hopStart. Internally when receiving the firmware will parse these bits and put them in a protobuf field Nodes on older firmware won't use those bits and do not use/know the |
@asjmcguire Agreed, at the time I suspected the app, but now I'm not so sure. If I knew how to move this to the appropriate location I would 😅
@GUVWAF The messages I received via the test I debug logged were sent via Android client, the information I was provided by the sender was "V2.3.0 app, v2.3.1 alpha node, and v2.2.24 node" , however the others that I've seen in the wild I'm afraid I have no clue, sorry. I'm running firmware 2.2.24 on my base node (which will have been the one relaying the message to my mobile node, which I used for the debug log). My mobile node, which I only used to receive those test messages for debug yesterday, is on 2.2.23. In terms of being the recipient, I've since tried a few different versions of the Android client, the Play Store beta as well as the Play Store and F-Droid store stable releases, but the problem is present regardless. I've also been doing some private MQTT logging from the base node directly to mosquitto on a local server (only in the past day - can't have been affecting this issue), and the messages are logged there as follows: {"channel":0,"from":3865901120,"id":266294490,"payload":{"text":"\u0000\u0000\u0000\u0000\u0000\u0000"},"rssi":-99,"sender":"!e2e37908","snr":3.25,"timestamp":1710664304,"to":4294967295,"type":"text"} (The above is just a random example of one of these garbled messages, received this morning. On occasion, the garbled message received seems to be very slightly different, it contains a quotation mark at the end. I know MQTT JSON isn't the best format to show this, but here's an example of one I received yesterday, where the actual visible output in the client looked something like {"channel":0,"from":3664101024,"id":2347968366,"payload":{"text":"\u0000\u0000\u0000\u0000\u0000\u0000\""},"rssi":-97,"sender":"!e2e37908","snr":-0.5,"timestamp":1710599894,"to":4294967295,"type":"text"} This is the only difference in the pattern that I've seen.
@b8b8 Unfortunately I'm on 2.2.24 myself, and am seeing the issue on app 2.3 as well as the previous stable version. I'm also seeing it on my 2.2.23 mobile node - it seems likely that the corruption is being generated somewhere on the mesh, perhaps at the origin, and is simply being passed along. |
There are only 2 real changes in the 2.3.0 firmware that have changed anything to do with message sending right @GUVWAF ? One of the introduction of the hop_start header and the other is dynamic changing of the hoplimit based on the calculation of how many hops away the response needs to be sent. All the other fixes in this firmware have been about precision for GPS, general GPS fixes and power saving fixes. So in my opinion it has to be something introduced with the 2 changes that deal with forming the actual mesh packet. |
Yes, it almost certainly has to do with the addition of It's weird that only the payload seems affected, from/to, hopLimit, SNR, etc. all look OK. I don't have time currently to test more thoroughly, but can look into it this week, then also with Android app. Would be good to know if it is already garbled in the logs when the packet arrives from the phone at the original sender. |
Somebody would have reported it for iOS if it is in the transport. |
The problem is (in the UK at least) many people haven't updated to 2.3.0 because it's labelled alpha and thus not stable, so there aren't enough 2.3.0 nodes "in the wild" to get better debugging. When the guys who were trying to track this down yesterday are back around - we will get them to log sending as well receiving and see if we can find a common factor. |
I'm still unable to reproduce. |
On the Traceroute matter, I am using Lora32 1.6 boards, and TBeams 1.1. |
Same happened here at a friend's node: |
Here, on a near to 1 hundred nodes mesh, the switch to 2.3.0 is being a hell. Thanks |
MQTT nodes using 2.3.0 will not be shown for nodes with older firmware and vice versa; this is intended and listed in the release notes. It would be good to collect data about nodes that send out garbled text, especially when they already show this in the serial logs after receiving it from the phone. Would be good to know what the content of the actual message was and which hardware and client (app version) they are using. |
Wow, 100 nodes... and I have about 10 nodes max, ... at first when I saw this I thought radio was out of range... the failed traceroutes, but RSSI was well within good range. What I've rolled back to 2.2.23, as 2.2.24 also seems to have GPS issues as well... what used to be 75 meters in SmartPosition, now I had to make it <15 meters to match what used to be 75 meters, otherwise it will take about 1 km to report moving things on another client... |
Maybe this is a clue. I get a lot of these messages in the log on my inside node when traffic is forwarded by my outside node on the roof running 2.3.1 firmware (heltec v3).
It seems like the packets arrive but most often than not the payload is bad. I guess in the case of text messages the payload is passed on directly showing the garbled contents. In my case the inside node don't hear distant nodes so more or less all traffic passes by the roof node which makes the problem very apparent. Enabling MQTT on the roof node to a local broker shows the packets arriving just fine. I believe it's just the packet forwarding that somehow corrupt the payload. |
@dantuf Do you have the logs just before that? I want to know if the payload length makes sense. Also, which firmware version is your inside node using and what hardware is it? And do you know the firmware and hardware type of the device that originally sent this packet? |
Okay, here we go. The inside node showing this log is a heltec v3 running 2.3.1
0x75e19a60 = TLORA_T3_S3 Not sure about the firmware on these nodes since they are not mine. |
Thanks for that. Payload length of 6 bytes is way too short for NodeInfo, not sure why this is incorrect while the rest of the info from the packet looks OK. |
It's a payload len of 6 further up too - on the logs @Obinice posted.
Which leaves me scratching my head as to how the compressed length is longer than the uncompressed length.... |
Indeed.
This can happen with the library it's using, but in that case it won't use the compressed version, as the next line shows. Actually compression is not working currently but that will be fixed in 3.0 and is unrelated to this issue. |
This issue has been mentioned on Meshtastic. There might be relevant details there: https://meshtastic.discourse.group/t/2-3-1-traceroute-and-remote-radio-config-broken/11275/4 |
As an additional element - I don't know if useful - I can tell that most of the nodes here have channel 1, 2 and 3 configured with a PSK. |
So down graded all my nodes to 2.2.24 and still having the same issue. one node on web and the other on android 2.2.24. All with direct trace routes |
can you tell that no other nodes with 2.3.x were between yours? |
They all trace route Direct but can't tell if there was a repeater in-between as i don't believe these are shown in the Trace Route as a hop |
Yes, if there were 2.3.x nodes in between they might not show up in the Traceroute if they have decoding errors. Just a heads-up: I asked someone that could reproduce it with two nodes (one 2.2.24 and one 2.3.0) to try this branch, which is a modified version of 2.3.0 without the suspicious change and that seems to resolve it. So we know that the addition of |
Ace, if i can be of any more help just let me know. Also happy to test out the modified version of 2.3.0 if you need more experiences
Ace, if i can be of any more help just let me know. Also happy to test out the modified version of 2.3.0 if you need more experiences |
I hope we don't have to lose HopStart and HopsAway functionality, but while I was taking a look at the revert - I noticed that via_mqtt has been deleted from NodeDB, is this meant? @GUVWAF |
@asjmcguire Ah, I see. No, that wasn't meant. If we revert it we'll ensure it's done properly, this was just for testing. |
I'm not familiar with this codebase, but the following stood out to me:
Looking at |
@erayd Thanks for thinking along. firmware/src/mesh/RadioInterface.cpp Line 572 in 794e99c
Using: firmware/src/mesh/RadioInterface.h Lines 16 to 17 in 794e99c
And here it is parsed from the header flags and stored in the struct: firmware/src/mesh/RadioLibInterface.cpp Line 364 in 794e99c
The strange thing is that the decoding errors only happen for the |
Thanks @GUVWAF - seems it wasn't what I was wondering; that looks sane enough to me 🙂 Will keep looking... |
Potentially what is needed is for a dev to add more verbose logging to 2.2.24 that spits out exactly (raw) what is received over the radio before any processing is done to it. So you can verify that what is received is what should be received. Then some logging to show what is being put out on the radio (again raw). Then send a message from a 2.3.X node to another 2.3.X node, but with the destination node switched off, ensuring that the 2.2.24 node HAS to rebroadcast. Then you will at least know if what is arriving at the 2.2.24 node is intact, that way we can at least rule out that 2.3.X is malforming something. That allows us to concentrate on whether it is getting malformed by 2.3.X or whether 2.2.24 is responsible for doing something strange with it, and whether that is during the decoding phase, or whether it is when rebuilding the packet for retransmission. |
Yes I'll carry on here vs the main forum as this is a very specific and seemingly sporadic issue. I did update the app and could trace between the two nodes as well as use admin channel. Otherwise the main one would not do a direct trace when right next to both. From a distance, it showed direct. Downgraded one node and can trace with app store version of app. Of note, the remote admin channel radio config also failed with the non alpha app. On hardware, both are RAK boards, 19007. One is older micro USB, the other is C. I have not seen a garbled message but have not tried many. Can connect both nodes to gather console output if needed from a PC but it sounds like what others have found is minimal since the output may need other debug flags enabled to see more raw data? Only other thing I have going is multiple channels with longfast as a later one in the list. If adding channels is part of the issue, perhaps that is a factor, I do not know. If there is something I can do to test further, would be glad to try today. I do not know if I can force it through another hop with 2.2.x or whatever is in the area unless there's some way to force it to use a neighboring node. |
I don't know if this is relevant, but reading the release notes I noticed that the following was updated. Notably the following parts: Among changes there, the following is of interest. ...So, if there is a difference in certificates and/or management of them, could this possibly explain the "garble" that is seen in some cases? It is also of interest that in some cases, it is not known whether or not a message is going to the same "channel" (if unknown), correct? In the case of encrypted message, it may be passing through a "last known" node on what it believes should be "primary" ... but landing on a node's 3rd channel, for example. I am still trying to understand that behavior from a post on the discourse board - https://meshtastic.discourse.group/t/question-about-private-messages-with-multiple-channels/11092
This may all be unrelated but seemed like a possibility after looking through the above and thought it worth mention. |
Certificates don't have anything to do with LoRA, and that change happened in 2.3.1. The issue manifests in 2.3.0 as well. |
And apparently goes away when hopStart is removed from the mesh packet, which is depressing, because Hops Away is the most exciting feature in a long while. |
I agree. This is why we're digging in to see if it's something we can pinpoint with the decoding / encoding and not just yanking the change yet. |
That's indeed a good idea. Since I still cannot reproduce it (tried other devices as well) I created branches for both 2.2.24 and 2.3.0 with debug messages containing the raw bytes. Hopefully the artifacts with compiled binaries will be available when the CI finishes here for 2.2.24 and here for 2.3.0, otherwise you'll also need to compile it yourself if you want to test. |
Good news: I could reproduce and found that the root cause is the NeighborInfo Module. This is because with It is fixed by #3474. For now if you use 2.3.0 or 2.3.1, please don’t enable the NeighborInfo module. Many thanks to everyone for providing logs or helping in other ways! |
Very good news indeed! Nice work! I can confirm your fix appears to solve the problem. |
That'll explain why I never saw the problem, I turned NeighbourInfo module off as soon as I updated to the first 2.3.0 alpha, on the basis that any packet with a HopsAway of 0, is clearly a neighbour, thus the NeighbourInfo module is essentially redundant right? |
Yes and no. Only if all nodes are using >=2.3.0, this can be guaranteed. Furthermore, if you enable the NeighborInfo module also on nodes that you don't connect to with your phone, you receive their NeighborInfo packets, so you know which nodes are their neighbors. When collecting these packets e.g. via MQTT, you can draw a graph of how all nodes are connected. |
Since updating to Android client 2.3.0, messages on LongFast sometimes come through garbled. In one particularly bad case, 8 out of 9 messages received over the course of an hour were garbled. The displayed text is always the same, regardless of the original message content, or the sender.
��떗
This was also a few days after applying firmware update 2.2.24 to my Heltec V3, so it is possible the issue resides there, however no iOS users I've spoken with thus far have noted any issues since the firmware update, and in the days between updating the firmware and the Android client this issue did not crop up. So, my first thought is that it could be a client issue (but this needs more investigation).
Attached are three screenshots from the Android app, one showing LongFast with two garbled messages, and the other two screenshots are from the debug screen showing the details of those two messages.
A few other users have discussed the issue in the Meshtastic Discord, they might be able to provide more information, and if there's any more info I can provide just let me know. Thank you for all your hard work, it's greatly appreciated <3
The text was updated successfully, but these errors were encountered: