Messages often come through garbled as "�떗" since recent update #3452

Obinice · 2024-03-15T07:58:54Z

Since updating to Android client 2.3.0, messages on LongFast sometimes come through garbled. In one particularly bad case, 8 out of 9 messages received over the course of an hour were garbled. The displayed text is always the same, regardless of the original message content, or the sender.

��떗

This was also a few days after applying firmware update 2.2.24 to my Heltec V3, so it is possible the issue resides there, however no iOS users I've spoken with thus far have noted any issues since the firmware update, and in the days between updating the firmware and the Android client this issue did not crop up. So, my first thought is that it could be a client issue (but this needs more investigation).

Attached are three screenshots from the Android app, one showing LongFast with two garbled messages, and the other two screenshots are from the debug screen showing the details of those two messages.

A few other users have discussed the issue in the Meshtastic Discord, they might be able to provide more information, and if there's any more info I can provide just let me know. Thank you for all your hard work, it's greatly appreciated <3

The text was updated successfully, but these errors were encountered:

andrekir · 2024-03-15T11:07:04Z

is LongFast connected to MQTT?

the debug log shows the payload in bytes, so this is how it came in from the radio. can you get the serial logs from the Heltec as this comes through?

Obinice · 2024-03-16T15:35:19Z

I don't have any public MQTT enabled.

I'm afraid my node is sealed up on a mast above the building currently, I won't have physical access to it again for another week or so, but I'll link other users who have been experiencing this to this issue, hopefully one of them can collect the serial data when one of these messages comes through. Thanks!

Vanguard4893 · 2024-03-16T15:53:11Z

Can confirm this issue is affecting myself also - serial debug log from the radio below. The message field appears as ????

This issue seems to affect messages at random, and 5 messages from the same endpoint will show 1-2 corrupted.

DEBUG | 15:45:30 2679 [RadioIf] (bw=250, sf=11, cr=4/5) packet symLen=8 ms, payloadSize=26, time 428 ms DEBUG | 15:45:30 2679 [RadioIf] Lora RX (id=0xb299fbc0 fr=0xf0 to=0xff, WantAck=0, HopLim=1 Ch=0x8 encrypted rxSNR=6.25 rxRSSI=-77) DEBUG | 15:45:30 2679 [RadioIf] AirTime - Packet received : 428ms DEBUG | 15:45:30 2679 [Router] recentPackets size=96 DEBUG | 15:45:30 2679 [Router] recentPackets size=94 (after clearing expired packets) DEBUG | 15:45:30 2679 [Router] Add packet record (id=0xb299fbc0 fr=0xf0 to=0xff, WantAck=0, HopLim=1 Ch=0x8 encrypted rxSNR=6.25 rxRSSI=-77) DEBUG | 15:45:30 2679 [Router] recentPackets size=95 DEBUG | 15:45:30 2679 [Router] recentPackets size=95 (after clearing expired packets) DEBUG | 15:45:30 2679 [Router] Using channel 0 (hash 0x8) DEBUG | 15:45:30 2679 [Router] Expanding short PSK meshtastic/Meshtastic-Android#1 DEBUG | 15:45:30 2679 [Router] Using AES128 key! DEBUG | 15:45:30 2679 [Router] ESP32 crypt fr=da5678f0, num=b299fbc0, numBytes=10! DEBUG | 15:45:30 2679 [Router] decoded message (id=0xb299fbc0 fr=0xf0 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710603930 rxSNR=6.25 rxRSSI=-77) DEBUG | 15:45:30 2679 [Router] handleReceived(REMOTE) (id=0xb299fbc0 fr=0xf0 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710603930 rxSNR=6.25 rxRSSI=-77) DEBUG | 15:45:30 2679 [Router] Module 'text' wantsPacket=1 INFO | 15:45:30 2679 [Router] Received text msg from=0xda5678f0, id=0xb299fbc0, msg=????

clwgh · 2024-03-16T16:03:36Z

I noticed after flashing 2.3.0, with the Heltec V3 still connected, the terminal halted with some words corrupted with what I think were those or similar characters. It did this consistently and in the same way and at the same point when flashing 2.3.0 each time. Flashing 2.2.24 did not do this and the terminal would continue to display updating text after rebooting. Perhaps this is related. I cannot test for the exact symbols seen right now unfortunately.

Edit: the exact text is below. Possibly not related but mentioned in case useful and triggers something.

...
Writing at 0x3a6625... (93%)
Writing at 0x3aa620... (96%)
Writing at 0x3b1a4c... (100%)
Wrote 1048576 bytes (473054 compressed) at 0x300000 in 43.734 seconds.
Leaving...
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x29 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3808,len:0x44c
load:0x403c9700,len:0xbe4
load:0x403cc700,len:0x2a38
entry 0x403c98d4
E (350) esp_core_dump_flash: No core dump partition found!
E (350) esp_core_dum����@INFO  | ??:??:?? 0 

//\ E S H T /\ S T / C

INFO  | ??:??:?? 0 Booted, wake cause 0 (boot count 1), reset_reason=reset
D

Obinice · 2024-03-16T16:35:38Z

I realised I could pull my mobile node out to capture this for you @andrekir , I hope this snapshot is enough, you can see the garbled message received and also retransmitted:

DEBUG | 16:29:51 142 [Power] Battery: usbPower=0, isCharging=0, batMv=4047, batPct=89
DEBUG | 16:29:53 144 [RadioIf] (bw=250, sf=11, cr=4/5) packet symLen=8 ms, payloadSize=26, time 428 ms
DEBUG | 16:29:53 144 [RadioIf] Lora RX (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x8 encrypted rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:53 144 [RadioIf] AirTime - Packet received : 428ms
DEBUG | 16:29:53 144 [Router] Add packet record (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x8 encrypted rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:53 144 [Router] Using channel 0 (hash 0x8)
DEBUG | 16:29:53 144 [Router] Expanding short PSK meshtastic/Meshtastic-Android#1
DEBUG | 16:29:53 144 [Router] Using AES128 key!
DEBUG | 16:29:53 144 [Router] ESP32 crypt fr=e66cf840, num=fdf4d7e, numBytes=10!
DEBUG | 16:29:53 144 [Router] decoded message (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:53 144 [Router] handleReceived(REMOTE) (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:54 144 [Router] Module 'text' wantsPacket=1
INFO  | 16:29:54 144 [Router] Received text msg from=0xe66cf840, id=0xfdf4d7e, msg=▒떗
DEBUG | 16:29:54 144 [Router] Enter state: ON
DEBUG | 16:29:54 144 [Router] showing standard frames
DEBUG | 16:29:54 144 [Router] Module wants a UI Frame
DEBUG | 16:29:54 144 [Router] Showing 1 module frames
DEBUG | 16:29:54 144 [Router] Total frame count: 104
DEBUG | 16:29:54 144 [Router] Added modules.  numframes: 1
DEBUG | 16:29:54 144 [Router] Finished building frames. numframes: 9
DEBUG | 16:29:54 144 [Router] Module 'text' considered
DEBUG | 16:29:54 144 [Router] Module 'canned' wantsPacket=1
DEBUG | 16:29:54 144 [Router] Module 'canned' considered
DEBUG | 16:29:54 144 [Router] Module 'StoreForward' wantsPacket=1
DEBUG | 16:29:54 144 [Router] Module 'StoreForward' considered
DEBUG | 16:29:54 144 [Router] Module 'ExternalNotificationModule' wantsPacket=1
INFO  | 16:29:54 145 [Router] External Notification Module Disabled
DEBUG | 16:29:54 145 [Router] Module 'ExternalNotificationModule' considered
DEBUG | 16:29:54 145 [Router] Module 'routing' wantsPacket=1
INFO  | 16:29:54 145 [Router] Received routing from=0xe66cf840, id=0xfdf4d7e, portnum=1, payloadlen=6
DEBUG | 16:29:54 145 [Router] Routing sniffing (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
INFO  | 16:29:54 145 [Router] Rebroadcasting received floodmsg to neighbors
DEBUG | 16:29:54 145 [Router] Original length - 6
DEBUG | 16:29:54 145 [Router] Compressed length - 10
DEBUG | 16:29:54 145 [Router] Original message - ▒떗
DEBUG | 16:29:54 145 [Router] Not using compressing message.
DEBUG | 16:29:54 145 [Router] Expanding short PSK meshtastic/Meshtastic-Android#1
DEBUG | 16:29:54 145 [Router] Using AES128 key!
DEBUG | 16:29:54 145 [Router] ESP32 crypt fr=e66cf840, num=fdf4d7e, numBytes=10!
DEBUG | 16:29:54 145 [Router] enqueuing for send (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=0 Ch=0x8 encrypted rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:54 145 [Router] txGood=9,rxGood=15,rxBad=0
DEBUG | 16:29:54 145 [Router] rx_snr found. hop_limit:0 rx_snr:5.250000
DEBUG | 16:29:54 145 [Router] rx_snr found in packet. Setting tx delay:3024
DEBUG | 16:29:54 145 [Router] Delivering rx packet (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:54 145 [Router] Update DB node 0xe66cf840, rx_time=1710606593
DEBUG | 16:29:54 145 [Router] Forwarding to phone (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:54 145 [Router] Module 'routing' considered
INFO  | 16:29:54 145 Telling client we have new packets 19
INFO  | 16:29:54 145 [ServerAPI] getFromRadio=STATE_SEND_PACKETS
DEBUG | 16:29:54 145 [ServerAPI] phone downloaded packet (id=0x0fdf4d7e fr=0x40 to=0xff, WantAck=0, HopLim=1 Ch=0x0 Portnum=1 rxtime=1710606593 rxSNR=5.25 rxRSSI=-76)
DEBUG | 16:29:54 145 [ServerAPI] encoding toPhone packet to phone variant=2, 52 bytes
DEBUG | 16:29:55 145 [RadioIf] (bw=250, sf=11, cr=4/5) packet symLen=8 ms, payloadSize=71, time 763 ms

asjmcguire · 2024-03-16T19:15:19Z

I'll just add my 2p here, I don't think it's Android that is the problem.... it's a firmware issue - and we brought it up in the firmware discord yesterday. I'd also add that perhaps the hopStart data should be added to the debugging output....

b8b8 · 2024-03-16T20:34:56Z

ive had the same results as above. Down grading my firmware to 2.2.24 but keeping the app at 2.3 prevents me from seeing any garbled messages as well as receiving more periodic telemetry than before. Before i would go hours or days between updates of telemetry from remote nodes while on 2.3 firmware.

GUVWAF · 2024-03-16T21:00:24Z

@Obinice Do you know if the message originated from the Android app as well?

So far I couldn't reproduce the issue sending between 2.2.24 and 2.3 devices, as well as from 2.2.24 --> 2.3.0 --> 2.2.24 or 2.2.24 --> 2.3.0 --> 2.3.0. This is when sending from either iOS or CLI (currently don't have an Android device with me). I've tried different kind of messages like DMs, traceroute and position request.

I'd also add that perhaps the hopStart data should be added to the debugging output....

It will be shown, but only if all nodes in the path are on 2.3.0.

prokrypt · 2024-03-16T21:07:26Z

@GUVWAF

It will be shown, but only if all nodes in the path are on 2.3.0.

But what if a node in the path has a different firmware version? What happens in these cases? Could it be that a repeating node is introducing some garbly garbage, chopping off some bits, or some other quirk?
2.2.24 (source) -> 2.3.0 (repeater) -> 2.2.24 (destination)
and
2.3.0 (source) -> 2.2.24 (repeater) -> 2.3.0 (destination)

asjmcguire · 2024-03-16T21:12:39Z

It will be shown, but only if all nodes in the path are on 2.3.0.

Not on line 271 though, which is clearly where the vast majority of debugging output seen above is originating.
Why do all nodes in the path have to be on 2.3.0 - I thought the point of HopStart in the header was that as long as the sender and receiver was on 2.3.0, the nodes in between didn't matter too much what firmware they were running.

GUVWAF · 2024-03-16T21:14:42Z

Over LoRa, three previously unused bits in the header are now used for hopStart. Internally when receiving the firmware will parse these bits and put them in a protobuf field hopStart, which is new, but added at the end of the MeshPacket to ensure backwards compatibility. This is also how the packet is sent to a client.
Then when it rebroadcasts it, it does the reverse operation: it puts the value of the hopStart field in the header.

Nodes on older firmware won't use those bits and do not use/know the hopStart protobuf field.

Obinice · 2024-03-17T08:56:04Z

I'll just add my 2p here, I don't think it's Android that is the problem.... it's a firmware issue - and we brought it up in the firmware discord yesterday. I'd also add that perhaps the hopStart data should be added to the debugging output....

@asjmcguire Agreed, at the time I suspected the app, but now I'm not so sure. If I knew how to move this to the appropriate location I would 😅

Do you know if the message originated from the Android app as well?

@GUVWAF The messages I received via the test I debug logged were sent via Android client, the information I was provided by the sender was "V2.3.0 app, v2.3.1 alpha node, and v2.2.24 node" , however the others that I've seen in the wild I'm afraid I have no clue, sorry.

I'm running firmware 2.2.24 on my base node (which will have been the one relaying the message to my mobile node, which I used for the debug log). My mobile node, which I only used to receive those test messages for debug yesterday, is on 2.2.23.

In terms of being the recipient, I've since tried a few different versions of the Android client, the Play Store beta as well as the Play Store and F-Droid store stable releases, but the problem is present regardless. I've also been doing some private MQTT logging from the base node directly to mosquitto on a local server (only in the past day - can't have been affecting this issue), and the messages are logged there as follows:

{"channel":0,"from":3865901120,"id":266294490,"payload":{"text":"\u0000\u0000\u0000\u0000\u0000\u0000"},"rssi":-99,"sender":"!e2e37908","snr":3.25,"timestamp":1710664304,"to":4294967295,"type":"text"}

(The above is just a random example of one of these garbled messages, received this morning.

On occasion, the garbled message received seems to be very slightly different, it contains a quotation mark at the end. I know MQTT JSON isn't the best format to show this, but here's an example of one I received yesterday, where the actual visible output in the client looked something like ��떗"

{"channel":0,"from":3664101024,"id":2347968366,"payload":{"text":"\u0000\u0000\u0000\u0000\u0000\u0000\""},"rssi":-97,"sender":"!e2e37908","snr":-0.5,"timestamp":1710599894,"to":4294967295,"type":"text"}

This is the only difference in the pattern that I've seen.

ive had the same results as above. Down grading my firmware to 2.2.24 but keeping the app at 2.3 prevents me from seeing any garbled messages

@b8b8 Unfortunately I'm on 2.2.24 myself, and am seeing the issue on app 2.3 as well as the previous stable version. I'm also seeing it on my 2.2.23 mobile node - it seems likely that the corruption is being generated somewhere on the mesh, perhaps at the origin, and is simply being passed along.

Obinice · 2024-03-17T12:19:07Z

One small update, I just happened to have the web UI open when I received a garbled message, and the text displayed is very slightly different. I don't know if that will help troubleshooting, but for completeness here it is:

��떗�

asjmcguire · 2024-03-17T12:26:54Z

There are only 2 real changes in the 2.3.0 firmware that have changed anything to do with message sending right @GUVWAF ? One of the introduction of the hop_start header and the other is dynamic changing of the hoplimit based on the calculation of how many hops away the response needs to be sent. All the other fixes in this firmware have been about precision for GPS, general GPS fixes and power saving fixes. So in my opinion it has to be something introduced with the 2 changes that deal with forming the actual mesh packet.

GUVWAF · 2024-03-17T13:59:33Z

Yes, it almost certainly has to do with the addition of hopStart to the MeshPacket. Changing the hop limit on a response doesn't impact a normal text message.

It's weird that only the payload seems affected, from/to, hopLimit, SNR, etc. all look OK.

I don't have time currently to test more thoroughly, but can look into it this week, then also with Android app. Would be good to know if it is already garbled in the logs when the packet arrives from the phone at the original sender.

garthvh · 2024-03-17T14:22:01Z

Somebody would have reported it for iOS if it is in the transport.

asjmcguire · 2024-03-17T14:37:10Z

The problem is (in the UK at least) many people haven't updated to 2.3.0 because it's labelled alpha and thus not stable, so there aren't enough 2.3.0 nodes "in the wild" to get better debugging. When the guys who were trying to track this down yesterday are back around - we will get them to log sending as well receiving and see if we can find a common factor.

GUVWAF · 2024-03-18T19:29:20Z

I'm still unable to reproduce.
I tried with the Android app version 2.2.23 and also version 2.3.0, both at the transmitter and receiver side, in combination with a mix of firmware 2.2.24 and 2.3.0, also with a hop in between. Tried all kind of different text with various characters/symbols and length, and also traceroute. I've never seen garbled text; not in the Android app, iOS app or serial logs - and also traceroute always works.
This was using a RAK4631, T-Beam and Heltec Wireless Paper. Maybe it's hardware specific?

eldoctorcool · 2024-03-18T20:12:00Z

On the Traceroute matter, I am using Lora32 1.6 boards, and TBeams 1.1.

GiuStel · 2024-03-18T21:13:26Z

I noticed after flashing 2.3.0, with the Heltec V3 still connected, the terminal halted with some words corrupted with what I think were those or similar characters. It did this consistently and in the same way and at the same point when flashing 2.3.0 each time. Flashing 2.2.24 did not do this and the terminal would continue to display updating text after rebooting. Perhaps this is related. I cannot test for the exact symbols seen right now unfortunately.

Edit: the exact text is below. Possibly not related but mentioned in case useful and triggers something.
...
Writing at 0x3a6625... (93%)
Writing at 0x3aa620... (96%)
Writing at 0x3b1a4c... (100%)
Wrote 1048576 bytes (473054 compressed) at 0x300000 in 43.734 seconds.
Leaving...
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x29 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3808,len:0x44c
load:0x403c9700,len:0xbe4
load:0x403cc700,len:0x2a38
entry 0x403c98d4
E (350) esp_core_dump_flash: No core dump partition found!
E (350) esp_core_dum����@INFO  | ??:??:?? 0 

//\ E S H T /\ S T / C

INFO  | ??:??:?? 0 Booted, wake cause 0 (boot count 1), reset_reason=reset
D

Same happened here at a friend's node:

GiuStel · 2024-03-18T21:25:20Z

Here, on a near to 1 hundred nodes mesh, the switch to 2.3.0 is being a hell.
We've not been able yet to realize what breaks what but most of the messages are unreadable, traceroute works but reports one hop only (most of the times), nodes coming from an MQTT branch never show up in nodes list unless you already had them into it. Receiving telemetries seems to be very hard too.
I'm available for any test that might be useful.

Thanks

GUVWAF · 2024-03-18T22:10:45Z

MQTT nodes using 2.3.0 will not be shown for nodes with older firmware and vice versa; this is intended and listed in the release notes.

It would be good to collect data about nodes that send out garbled text, especially when they already show this in the serial logs after receiving it from the phone. Would be good to know what the content of the actual message was and which hardware and client (app version) they are using.

eldoctorcool · 2024-03-18T22:20:43Z

Wow, 100 nodes... and I have about 10 nodes max, ... at first when I saw this I thought radio was out of range... the failed traceroutes, but RSSI was well within good range. What I've rolled back to 2.2.23, as 2.2.24 also seems to have GPS issues as well... what used to be 75 meters in SmartPosition, now I had to make it <15 meters to match what used to be 75 meters, otherwise it will take about 1 km to report moving things on another client...

dantuf · 2024-03-19T18:48:28Z

Maybe this is a clue. I get a lot of these messages in the log on my inside node when traffic is forwarded by my outside node on the roof running 2.3.1 firmware (heltec v3).

ERROR | 18:36:36 262 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15aca8
ERROR | 18:36:36 262 [Router] Error decoding protobuf module!
ERROR | 18:36:36 262 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15aca8
ERROR | 18:36:36 262 [Router] Error decoding protobuf module!
DEBUG | 18:36:36 262 [Router] Module 'nodeinfo' considered
DEBUG | 18:36:36 262 [Router] Module 'nodeinfo' handled and skipped other processing

ERROR | 18:36:47 272 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15b3d0
ERROR | 18:36:47 272 [Router] Error decoding protobuf module!
ERROR | 18:36:47 272 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15b3d0
ERROR | 18:36:47 272 [Router] Error decoding protobuf module!
DEBUG | 18:36:47 272 [Router] Module 'DeviceTelemetry' considered
DEBUG | 18:36:47 272 [Router] Module 'DeviceTelemetry' handled and skipped other processing

It seems like the packets arrive but most often than not the payload is bad. I guess in the case of text messages the payload is passed on directly showing the garbled contents.

In my case the inside node don't hear distant nodes so more or less all traffic passes by the roof node which makes the problem very apparent. Enabling MQTT on the roof node to a local broker shows the packets arriving just fine. I believe it's just the packet forwarding that somehow corrupt the payload.

GUVWAF · 2024-03-19T19:13:30Z

@dantuf Do you have the logs just before that? I want to know if the payload length makes sense.

Also, which firmware version is your inside node using and what hardware is it? And do you know the firmware and hardware type of the device that originally sent this packet?

dantuf · 2024-03-19T19:27:18Z

Okay, here we go. The inside node showing this log is a heltec v3 running 2.3.1

DEBUG | 19:18:48 2793 [RadioIf] (bw=250, sf=11, cr=4/5) packet symLen=8 ms, payloadSize=26, time 428 ms
DEBUG | 19:18:48 2793 [RadioIf] Lora RX (id=0x1b72d8e9 fr=0x60 to=0xff, WantAck=0, HopLim=2 Ch=0x8 encrypted rxSNR=7 rxRSSI=-68 hopStart=3)
DEBUG | 19:18:48 2793 [RadioIf] AirTime - Packet received : 428ms
DEBUG | 19:18:48 2794 [Router] Add packet record (id=0x1b72d8e9 fr=0x60 to=0xff, WantAck=0, HopLim=2 Ch=0x8 encrypted rxSNR=7 rxRSSI=-68 hopStart=3)
DEBUG | 19:18:48 2794 [Router] Using channel 0 (hash 0x8)
DEBUG | 19:18:48 2794 [Router] Expanding short PSK meshtastic/Meshtastic-Android#1
DEBUG | 19:18:48 2794 [Router] Using AES128 key!
DEBUG | 19:18:48 2794 [Router] ESP32 crypt fr=75e19a60, num=1b72d8e9, numBytes=10!
DEBUG | 19:18:48 2794 [Router] decoded message (id=0x1b72d8e9 fr=0x60 to=0xff, WantAck=0, HopLim=2 Ch=0x0 Portnum=4 rxtime=1710875928 rxSNR=7 rxRSSI=-68 hopStart=3)
DEBUG | 19:18:48 2794 [Router] handleReceived(REMOTE) (id=0x1b72d8e9 fr=0x60 to=0xff, WantAck=0, HopLim=2 Ch=0x0 Portnum=4 rxtime=1710875928 rxSNR=7 rxRSSI=-68 hopStart=3)
DEBUG | 19:18:48 2794 [Router] Module 'nodeinfo' wantsPacket=1
INFO  | 19:18:49 2794 [Router] Received nodeinfo from=0x75e19a60, id=0x1b72d8e9, portnum=4, payloadlen=6
ERROR | 19:18:49 2794 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15aca8
ERROR | 19:18:49 2794 [Router] Error decoding protobuf module!
ERROR | 19:18:49 2794 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15aca8
ERROR | 19:18:49 2794 [Router] Error decoding protobuf module!
DEBUG | 19:18:49 2794 [Router] Module 'nodeinfo' considered
DEBUG | 19:18:49 2794 [Router] Module 'nodeinfo' handled and skipped other processing
DEBUG | 19:18:50 2795 [RadioIf] (bw=250, sf=11, cr=4/5) packet symLen=8 ms, payloadSize=26, time 428 ms
DEBUG | 19:18:50 2795 [RadioIf] Lora RX (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x8 encrypted rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [RadioIf] AirTime - Packet received : 428ms
DEBUG | 19:18:50 2795 [Router] Add packet record (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x8 encrypted rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [Router] Using channel 0 (hash 0x8)
DEBUG | 19:18:50 2795 [Router] Expanding short PSK meshtastic/Meshtastic-Android#1
DEBUG | 19:18:50 2795 [Router] Using AES128 key!
DEBUG | 19:18:50 2795 [Router] ESP32 crypt fr=e2e2e1e8, num=5a4c466f, numBytes=10!
DEBUG | 19:18:50 2795 [Router] decoded message (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x0 Portnum=66 rxtime=1710875930 rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [Router] handleReceived(REMOTE) (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x0 Portnum=66 rxtime=1710875930 rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [Router] Module 'routing' wantsPacket=1
INFO  | 19:18:50 2795 [Router] Received routing from=0xe2e2e1e8, id=0x5a4c466f, portnum=66, payloadlen=6
DEBUG | 19:18:50 2795 [Router] Routing sniffing (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x0 Portnum=66 rxtime=1710875930 rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [Router] Delivering rx packet (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x0 Portnum=66 rxtime=1710875930 rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [Router] Update DB node 0xe2e2e1e8, rx_time=1710875930
INFO  | 19:18:50 2795 [Router] Heard a node on channel 0 we don't know, sending NodeInfo and asking for a response.
DEBUG | 19:18:50 2795 [Router] cancelSending id=0x15657d5b, removed=0
DEBUG | 19:18:50 2795 [Router] Sending NodeInfo will be ignored since we just sent it.
DEBUG | 19:18:50 2795 [Router] Forwarding to phone (id=0x5a4c466f fr=0xe8 to=0xff, WantAck=0, HopLim=0 Ch=0x0 Portnum=66 rxtime=1710875930 rxSNR=6.75 rxRSSI=-69)
DEBUG | 19:18:50 2795 [Router] Module 'routing' considered
DEBUG | 19:18:57 2802 [RadioIf] (bw=250, sf=11, cr=4/5) packet symLen=8 ms, payloadSize=31, time 468 ms
DEBUG | 19:18:57 2802 [RadioIf] Lora RX (id=0x4f35ffdd fr=0x36 to=0xdc, WantAck=0, HopLim=2 Ch=0x8 encrypted rxSNR=6.25 rxRSSI=-68)
DEBUG | 19:18:57 2802 [RadioIf] AirTime - Packet received : 468ms
DEBUG | 19:18:57 2803 [Router] Add packet record (id=0x4f35ffdd fr=0x36 to=0xdc, WantAck=0, HopLim=2 Ch=0x8 encrypted rxSNR=6.25 rxRSSI=-68)
DEBUG | 19:18:57 2803 [Router] Using channel 0 (hash 0x8)
DEBUG | 19:18:57 2803 [Router] Expanding short PSK meshtastic/Meshtastic-Android#1
DEBUG | 19:18:57 2803 [Router] Using AES128 key!
DEBUG | 19:18:57 2803 [Router] ESP32 crypt fr=2d786336, num=4f35ffdd, numBytes=15!
DEBUG | 19:18:57 2803 [Router] decoded message (id=0x4f35ffdd fr=0x36 to=0xdc, WantAck=0, HopLim=2 Ch=0x0 Portnum=4 requestId=5bff2716 rxtime=1710875937 rxSNR=6.25 rxRSSI=-68)
DEBUG | 19:18:57 2803 [Router] handleReceived(REMOTE) (id=0x4f35ffdd fr=0x36 to=0xdc, WantAck=0, HopLim=2 Ch=0x0 Portnum=4 requestId=5bff2716 rxtime=1710875937 rxSNR=6.25 rxRSSI=-68)
DEBUG | 19:18:57 2803 [Router] Module 'nodeinfo' wantsPacket=1
INFO  | 19:18:57 2803 [Router] Received nodeinfo from=0x2d786336, id=0x4f35ffdd, portnum=4, payloadlen=6
ERROR | 19:18:57 2803 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15aca8
ERROR | 19:18:58 2803 [Router] Error decoding protobuf module!
ERROR | 19:18:58 2803 [Router] Can't decode protobuf reason='wrong wire type', pb_msgdesc 0x3c15aca8
ERROR | 19:18:58 2803 [Router] Error decoding protobuf module!
DEBUG | 19:18:58 2803 [Router] Module 'nodeinfo' considered
DEBUG | 19:18:58 2803 [Router] Module 'nodeinfo' handled and skipped other processing

0x75e19a60 = TLORA_T3_S3
0xe2e2e1e8 = HELTEC_V3
0x2d786336 = RAK4631

Not sure about the firmware on these nodes since they are not mine.

GUVWAF · 2024-03-19T19:37:30Z

Thanks for that. Payload length of 6 bytes is way too short for NodeInfo, not sure why this is incorrect while the rest of the info from the packet looks OK.

asjmcguire · 2024-03-19T19:48:26Z

It's a payload len of 6 further up too - on the logs @Obinice posted.
as well the very confusing part of:

DEBUG | 16:29:54 145 [Router] Original length - 6
DEBUG | 16:29:54 145 [Router] Compressed length - 10
DEBUG | 16:29:54 145 [Router] Original message - ▒떗

Which leaves me scratching my head as to how the compressed length is longer than the uncompressed length....

GUVWAF · 2024-03-19T19:55:22Z

Indeed.

Which leaves me scratching my head as to how the compressed length is longer than the uncompressed length....

This can happen with the library it's using, but in that case it won't use the compressed version, as the next line shows. Actually compression is not working currently but that will be fixed in 3.0 and is unrelated to this issue.

geeksville · 2024-03-21T18:54:50Z

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/2-3-1-traceroute-and-remote-radio-config-broken/11275/4

GiuStel · 2024-03-21T19:28:02Z

As an additional element - I don't know if useful - I can tell that most of the nodes here have channel 1, 2 and 3 configured with a PSK.
Channel 0 is MediumFast with standard PSK.
Rolling back (tens of nodes) to version 2.2.24 is taking us back to the normal mesh behaviour.
We're availble for tests if needed. Thanks.

grahamrobertslx · 2024-03-21T19:33:18Z

So down graded all my nodes to 2.2.24 and still having the same issue. one node on web and the other on android 2.2.24.
Also...
iOS app to node on web app web app receives the ???? but iOS app says the message had an issue "Implicit ACK from another node". Message sent from web to iOS and the message isn't received.
Messages between iOS and Android work fine.

All with direct trace routes

GiuStel · 2024-03-21T19:40:16Z

So down graded all my nodes to 2.2.24 and still having the same issue. one node on web and the other on android 2.2.24. Also... iOS app to node on web app web app receives the ???? but iOS app says the message had an issue "Implicit ACK from another node". Message sent from web to iOS and the message isn't received. Messages between iOS and Android work fine.

All with direct trace routes

can you tell that no other nodes with 2.3.x were between yours?

grahamrobertslx · 2024-03-21T19:45:06Z

So down graded all my nodes to 2.2.24 and still having the same issue. one node on web and the other on android 2.2.24. Also... iOS app to node on web app web app receives the ???? but iOS app says the message had an issue "Implicit ACK from another node". Message sent from web to iOS and the message isn't received. Messages between iOS and Android work fine.
All with direct trace routes

can you tell that no other nodes with 2.3.x were between yours?

They all trace route Direct but can't tell if there was a repeater in-between as i don't believe these are shown in the Trace Route as a hop

GUVWAF · 2024-03-21T19:47:48Z

can you tell that no other nodes with 2.3.x were between yours?

Yes, if there were 2.3.x nodes in between they might not show up in the Traceroute if they have decoding errors.

Just a heads-up: I asked someone that could reproduce it with two nodes (one 2.2.24 and one 2.3.0) to try this branch, which is a modified version of 2.3.0 without the suspicious change and that seems to resolve it. So we know that the addition of hopStart to the MeshPacket causes this, but we don't know why. If we can't resolve it quickly, we'll revert this change to get a new beta out soon.

grahamrobertslx · 2024-03-21T19:51:04Z

Ace, if i can be of any more help just let me know. Also happy to test out the modified version of 2.3.0 if you need more experiences

can you tell that no other nodes with 2.3.x were between yours?

Yes, if there were 2.3.x nodes in between they might not show up in the Traceroute if they have decoding errors.

Just a heads-up: I asked someone that could reproduce it with two nodes (one 2.2.24 and one 2.3.0) to try this branch, which is a modified version of 2.3.0 without the suspicious change and that seems to resolve it. So we know that the addition of hopStart to the MeshPacket causes this, but we don't know why. If we can't resolve it quickly, we'll revert this change to get a new beta out soon.

Ace, if i can be of any more help just let me know. Also happy to test out the modified version of 2.3.0 if you need more experiences

asjmcguire · 2024-03-21T20:52:52Z

Just a heads-up: I asked someone that could reproduce it with two nodes (one 2.2.24 and one 2.3.0) to try this branch, which is a modified version of 2.3.0 without the suspicious change and that seems to resolve it. So we know that the addition of hopStart to the MeshPacket causes this, but we don't know why. If we can't resolve it quickly, we'll revert this change to get a new beta out soon.

I hope we don't have to lose HopStart and HopsAway functionality, but while I was taking a look at the revert - I noticed that via_mqtt has been deleted from NodeDB, is this meant? @GUVWAF

GUVWAF · 2024-03-21T20:54:54Z

@asjmcguire Ah, I see. No, that wasn't meant. If we revert it we'll ensure it's done properly, this was just for testing.

erayd · 2024-03-22T06:20:46Z

I'm not familiar with this codebase, but the following stood out to me:

Over LoRa, three previously unused bits in the header are now used for hopStart. Internally when receiving the firmware will parse these bits and put them in a protobuf field hopStart, which is new, but added at the end of the MeshPacket to ensure backwards compatibility.

Looking at mesh.proto and mesh.options, MeshPacket.hop_start is defined as 8 bits, not 3. Is it possible there's an overrun somewhere due to this difference? Where in the source does the conversion between 3 & 8 bits occur?

GUVWAF · 2024-03-22T07:42:33Z

@erayd Thanks for thinking along.
Here it is put into the header flags (which is 8 bits in total):

firmware/src/mesh/RadioInterface.cpp

Line 572 in 794e99c

    
           h->flags |= (p->hop_start << PACKET_FLAGS_HOP_START_SHIFT) & PACKET_FLAGS_HOP_START_MASK;

Using:

firmware/src/mesh/RadioInterface.h

Lines 16 to 17 in 794e99c

    
           #define PACKET_FLAGS_HOP_START_MASK 0xE0 
        
           #define PACKET_FLAGS_HOP_START_SHIFT 5

And here it is parsed from the header flags and stored in the struct:

firmware/src/mesh/RadioLibInterface.cpp

Line 364 in 794e99c

    
           mp->hop_start = (h->flags & PACKET_FLAGS_HOP_START_MASK) >> PACKET_FLAGS_HOP_START_SHIFT;

The strange thing is that the decoding errors only happen for the payload inside the Data message in the MeshPacket. This payload itself is again a protobuf message (or just a string for a text message) of a specific type given by the port number, but the length of this payload looks somehow incorrect.

erayd · 2024-03-22T08:04:45Z

Thanks @GUVWAF - seems it wasn't what I was wondering; that looks sane enough to me 🙂

Will keep looking...

asjmcguire · 2024-03-22T10:15:22Z

Potentially what is needed is for a dev to add more verbose logging to 2.2.24 that spits out exactly (raw) what is received over the radio before any processing is done to it. So you can verify that what is received is what should be received. Then some logging to show what is being put out on the radio (again raw).

Then send a message from a 2.3.X node to another 2.3.X node, but with the destination node switched off, ensuring that the 2.2.24 node HAS to rebroadcast.

Then you will at least know if what is arriving at the 2.2.24 node is intact, that way we can at least rule out that 2.3.X is malforming something. That allows us to concentrate on whether it is getting malformed by 2.3.X or whether 2.2.24 is responsible for doing something strange with it, and whether that is during the decoding phase, or whether it is when rebuilding the packet for retransmission.

eureekasigns · 2024-03-22T11:44:48Z

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/2-3-1-traceroute-and-remote-radio-config-broken/11275/4

Yes I'll carry on here vs the main forum as this is a very specific and seemingly sporadic issue.

I did update the app and could trace between the two nodes as well as use admin channel. Otherwise the main one would not do a direct trace when right next to both.

From a distance, it showed direct.

Downgraded one node and can trace with app store version of app.

Of note, the remote admin channel radio config also failed with the non alpha app.

On hardware, both are RAK boards, 19007. One is older micro USB, the other is C. I have not seen a garbled message but have not tried many. Can connect both nodes to gather console output if needed from a PC but it sounds like what others have found is minimal since the output may need other debug flags enabled to see more raw data?

Only other thing I have going is multiple channels with longfast as a later one in the list. If adding channels is part of the issue, perhaps that is a factor, I do not know.

If there is something I can do to test further, would be glad to try today. I do not know if I can force it through another hop with 2.2.x or whatever is in the area unless there's some way to force it to use a neighboring node.

eureekasigns · 2024-03-22T15:16:08Z

I don't know if this is relevant, but reading the release notes I noticed that the following was updated.
Could this relate at all to the issue between versions of nodes (and/or potentially app version)?

#3393
0333eb9

Notably the following parts:
platform = platformio/nordicnrf52@^10.4.0

Among changes there, the following is of interest.
https://github.com/platformio/platform-nordicnrf52/releases/tag/v10.2.0
which shows: arduino/ArduinoCore-mbed#710
Which contains:
Add ISGR Root X1 (Let's Encrypt) to CAs
Add instructions to manage certificates.h and cacert.pem

...So, if there is a difference in certificates and/or management of them, could this possibly explain the "garble" that is seen in some cases?

It is also of interest that in some cases, it is not known whether or not a message is going to the same "channel" (if unknown), correct? In the case of encrypted message, it may be passing through a "last known" node on what it believes should be "primary" ... but landing on a node's 3rd channel, for example. I am still trying to understand that behavior from a post on the discourse board -

https://meshtastic.discourse.group/t/question-about-private-messages-with-multiple-channels/11092

"It will use the channel where it most recently heard a Node Info packet from the destination on. That’s most likely the destination’s primary, but can be a secondary."

This may all be unrelated but seemed like a possibility after looking through the above and thought it worth mention.

thebentern · 2024-03-22T16:13:37Z

Certificates don't have anything to do with LoRA, and that change happened in 2.3.1. The issue manifests in 2.3.0 as well.

asjmcguire · 2024-03-22T17:04:56Z

The issue manifests in 2.3.0 as well.

And apparently goes away when hopStart is removed from the mesh packet, which is depressing, because Hops Away is the most exciting feature in a long while.

thebentern · 2024-03-22T17:15:11Z

And apparently goes away when hopStart is removed from the mesh packet, which is depressing, because Hops Away is the most exciting feature in a long while.

I agree. This is why we're digging in to see if it's something we can pinpoint with the decoding / encoding and not just yanking the change yet.

GUVWAF · 2024-03-22T19:54:52Z

Potentially what is needed is for a dev to add more verbose logging to 2.2.24 that spits out exactly (raw) what is received over the radio before any processing is done to it.

That's indeed a good idea. Since I still cannot reproduce it (tried other devices as well) I created branches for both 2.2.24 and 2.3.0 with debug messages containing the raw bytes. Hopefully the artifacts with compiled binaries will be available when the CI finishes here for 2.2.24 and here for 2.3.0, otherwise you'll also need to compile it yourself if you want to test.

GUVWAF · 2024-03-23T09:31:08Z

Good news: I could reproduce and found that the root cause is the NeighborInfo Module. This is because with hopStart we can also determine who are neighbors for any other packet. I got full logs from another user and found that the NeighborInfo module was altering the packet even if it wasn’t a NeighborInfo packet. Even when that resulted into decoding errors, it would overwrite the content with garbage. It can only be reproduced if a node with NeighborInfo module enabled on 2.3.0 or 2.3.1 is in between or when testing with two devices on 2.3.0/2.3.1 if both have them enabled.

It is fixed by #3474. For now if you use 2.3.0 or 2.3.1, please don’t enable the NeighborInfo module.

Many thanks to everyone for providing logs or helping in other ways!
I’m sorry for the inconvenience but happy that we don’t have to revert the hopStart addition to the MeshPacket.

dantuf · 2024-03-23T10:44:58Z

Very good news indeed! Nice work! I can confirm your fix appears to solve the problem.

asjmcguire · 2024-03-23T11:36:21Z

That'll explain why I never saw the problem, I turned NeighbourInfo module off as soon as I updated to the first 2.3.0 alpha, on the basis that any packet with a HopsAway of 0, is clearly a neighbour, thus the NeighbourInfo module is essentially redundant right?

GUVWAF · 2024-03-23T11:46:40Z

on the basis that any packet with a HopsAway of 0, is clearly a neighbour, thus the NeighbourInfo module is essentially redundant right?

Yes and no. Only if all nodes are using >=2.3.0, this can be guaranteed. Furthermore, if you enable the NeighborInfo module also on nodes that you don't connect to with your phone, you receive their NeighborInfo packets, so you know which nodes are their neighbors. When collecting these packets e.g. via MQTT, you can draw a graph of how all nodes are connected.

* Use `alterReceivedProtobuf()` for NeighborInfo and Traceroute `alterReceived()` should never return NULL Traceroute should be promiscuous * Remove extensive logging from NeighborInfo module

GUVWAF mentioned this issue Mar 18, 2024

[Bug]: Traceroutes fail, most of the time, to register on the app with fw 2.3.0... 2.2.24 traceroutes show up on the android app. #3429

Closed

andrekir transferred this issue from meshtastic/Meshtastic-Android Mar 21, 2024

andrekir mentioned this issue Mar 21, 2024

[Bug]: Issue Receiving certain messages #3447

Closed

GUVWAF mentioned this issue Mar 23, 2024

Fix #3452: only alter received packet if port number matches #3474

Merged

thebentern closed this as completed in #3474 Mar 23, 2024

ConnorDFWT mentioned this issue Sep 5, 2024

[Bug]: Messages coming through incomplete, filled with '�' and other symbols #4636

Closed

Messages often come through garbled as "�떗" since recent update #3452

Messages often come through garbled as "�떗" since recent update #3452

Comments

Obinice commented Mar 15, 2024

andrekir commented Mar 15, 2024

Obinice commented Mar 16, 2024

Vanguard4893 commented Mar 16, 2024

clwgh commented Mar 16, 2024 • edited Loading

Obinice commented Mar 16, 2024

asjmcguire commented Mar 16, 2024 • edited Loading

b8b8 commented Mar 16, 2024

GUVWAF commented Mar 16, 2024

prokrypt commented Mar 16, 2024 • edited Loading

asjmcguire commented Mar 16, 2024 • edited Loading

GUVWAF commented Mar 16, 2024

Obinice commented Mar 17, 2024

Obinice commented Mar 17, 2024

asjmcguire commented Mar 17, 2024

GUVWAF commented Mar 17, 2024

garthvh commented Mar 17, 2024

asjmcguire commented Mar 17, 2024

GUVWAF commented Mar 18, 2024

eldoctorcool commented Mar 18, 2024

GiuStel commented Mar 18, 2024

GiuStel commented Mar 18, 2024

GUVWAF commented Mar 18, 2024 • edited Loading

eldoctorcool commented Mar 18, 2024

dantuf commented Mar 19, 2024 • edited Loading

GUVWAF commented Mar 19, 2024

dantuf commented Mar 19, 2024 • edited Loading

GUVWAF commented Mar 19, 2024

asjmcguire commented Mar 19, 2024

GUVWAF commented Mar 19, 2024

geeksville commented Mar 21, 2024

GiuStel commented Mar 21, 2024

grahamrobertslx commented Mar 21, 2024 • edited Loading

GiuStel commented Mar 21, 2024

grahamrobertslx commented Mar 21, 2024

GUVWAF commented Mar 21, 2024

grahamrobertslx commented Mar 21, 2024

asjmcguire commented Mar 21, 2024

GUVWAF commented Mar 21, 2024

erayd commented Mar 22, 2024

GUVWAF commented Mar 22, 2024 • edited Loading

erayd commented Mar 22, 2024

asjmcguire commented Mar 22, 2024

eureekasigns commented Mar 22, 2024

eureekasigns commented Mar 22, 2024 • edited Loading

thebentern commented Mar 22, 2024

asjmcguire commented Mar 22, 2024 • edited Loading

thebentern commented Mar 22, 2024

GUVWAF commented Mar 22, 2024

GUVWAF commented Mar 23, 2024

dantuf commented Mar 23, 2024 • edited Loading

asjmcguire commented Mar 23, 2024

GUVWAF commented Mar 23, 2024

clwgh commented Mar 16, 2024 •

edited

Loading

asjmcguire commented Mar 16, 2024 •

edited

Loading

prokrypt commented Mar 16, 2024 •

edited

Loading

asjmcguire commented Mar 16, 2024 •

edited

Loading

GUVWAF commented Mar 18, 2024 •

edited

Loading

dantuf commented Mar 19, 2024 •

edited

Loading

dantuf commented Mar 19, 2024 •

edited

Loading

grahamrobertslx commented Mar 21, 2024 •

edited

Loading

GUVWAF commented Mar 22, 2024 •

edited

Loading

eureekasigns commented Mar 22, 2024 •

edited

Loading

asjmcguire commented Mar 22, 2024 •

edited

Loading

dantuf commented Mar 23, 2024 •

edited

Loading