Mask out random bits when doing queue ordering #4561

jp-bennett · 2024-08-26T17:43:44Z

Unintended consequence of randomizing top bits of Packet IDs was that we do use those values to determine packet order in the queue. Simple fix, to mask out the random bits when doing that comparison.

jp-bennett · 2024-08-26T17:44:24Z

Fixes #4560

gitbisector

Using parenthesis #define ID_COUNTER_MASK (UINT32_MAX >> 22) is safer.

jp-bennett · 2024-08-26T18:00:01Z

Using parenthesis #define ID_COUNTER_MASK (UINT32_MAX >> 22) is safer.

That's fair.

gitbisector · 2024-08-26T18:02:27Z

The other thing I wonder about is packet_ids coming from other nodes. Are they mixed with locally generated ones? That's not specific to this MSB randomization but may still cause trouble? Is there no access to the RX timestamp here? Saw code where locally generated packets get a timestamp put on.

gitbisector · 2024-08-27T00:36:08Z

@thebentern sorry to ping but while this is still fresh. 'rx_time' field has your name on it. As pkt->id is generated on each transmitting node using it to sort when priorities are equal is not fair. Propose using rx_time instead. Locally created packets get rx_time asssigned too in router.cpp:allocForSending()

return (p1p != p2p) ? (p1p < p2p) // prefer bigger priorities
: (p1->rx_time >= p2->rx_time); // prefer earlier packets

GUVWAF · 2024-08-27T05:51:04Z

I'm not sure if that's a good idea. Packets you receive for rebroadcasting might have been generated much earlier than you received it. If you wait too long for rebroadcasting, another node that is not in an advantageous position might jump in.

Currently it's not perfect either, because it will be random, but I'm not sure whether there is a better solution. However, having multiple packets with the same priority in the Tx queue is already not a common situation, unless your mesh is very busy.

gitbisector · 2024-08-27T14:22:32Z

If only it was truly random. Trouble with the current approach is that in a busy mesh, unlucky packets with a 'bad' id can sit in a node almost forever as the tx_queue barely ever goes empty. When they finally come out, some packets might even get seen as 'new' rather than duplicate as they have been forgotten. While not perfect (we don't have complete information), using rx_time does improve on this.

GUVWAF · 2024-08-27T16:47:24Z

@gitbisector If we want to optimize this even further, I still think packets not generated locally should have priority for the following reasons:

Packets received on the mesh are at least generated the airtime of the packet before the rx_time and possible longer if they went through multiple hops already.
The delay a node has before rebroadcasting actually determines who rebroadcasts, because others will cancel it afterwards. If you give priority to your own packets first, another node might jump in, even though it has a high SNR and thus is not likely to contribute to routing.
Client applications are notified with the current queue status. If it's full, it's a good indication to slow down your packet generation by doing queuing in the application. Packets to be relayed, on the other hand, will simply be dropped without notification to the original transmitter.
It decreases the chance of judging a received packet as new instead of duplicate.

gitbisector · 2024-08-27T17:02:41Z

Agree, am not at all advocating local packets should get priority. As it stands, sorting within a priority based on ids generated by different sources is causing bias towards low packet ids putting a huge tail latency on packets with large ids. In busy meshes (at least ours) that is making for extra stale packets were we even laugh about how stale they are. This applies to router/repeater nodes too as even they spend more time listening than transmitting keeping the queue from emptying.

GUVWAF · 2024-08-27T17:21:34Z

Agree, am not at all advocating local packets should get priority.

Okay, but then there should be a different strategy than looking at the rx_time only, as a client application or the device itself can fill the Tx queue much faster than the packets are received and transmitted by the LoRa radio.

Problem is also that the rx_time is set by getValidTime(RTCQualityFromNet). If we don't have a proper notion of time, it won't be set.

gitbisector · 2024-08-27T17:57:24Z

Problem is also that the rx_time is set by getValidTime(RTCQualityFromNet). If we don't have a proper notion of time, it won't be set.

Didn't realize time can stand still. For same priority sorting we do need a node-local incrementing count. Can be a static inside of meskpacketqueue that increments in enqueue() but need a spot in meshtastic_MeshPacket to put it. (Sorting on all equal time within priority might still work but would need to experiment to verify)

GUVWAF · 2024-08-27T18:03:36Z

How we sort items does not change that.

True, but even if it doesn't occupy it fully, it does influence in which order they are sent out, with the consequences I laid out earlier. The input from the device or client itself is asynchronous to the LoRa radio, and if you transmit new packets every time, you even delay others from rebroadcasting.

For mesh scaling purposes the app/device should get a small part.

I'm not sure all client apps currently properly handle the case where the Tx queue is full, I would not be confident by given them a smaller part.

gitbisector · 2024-08-27T18:03:37Z

Okay, but then there should be a different strategy than looking at the rx_time only, as a client application or the device itself can fill the Tx queue much faster than the packets are received and transmitted by the LoRa radio.

True, client/device can easily occupy all (16) queue slots. How we sort items does not change that. Limiting sources to only a fraction of the maximum queue space could help with that. For mesh scaling purposes the app/device should get a small part.

gitbisector · 2024-08-27T18:04:28Z

I'm not sure all client apps currently properly handle the case where the Tx queue is full, I would not be confident by given them a smaller part.

Alternative is to make the queue maxlen bigger and keep the size for client apps the same :)

GUVWAF · 2024-08-27T18:09:01Z

Alternative is to make it bigger :)

Yeah, but we're already talking about an edge case here. If your mesh is this busy, you'll likely have congestion issues already, which may be the cause of seeing "stale" packets due to high-priority packets coming in between.

Increasing the Tx queue size and adding a node-local counter doesn't come for free.

gitbisector · 2024-08-27T18:21:16Z

Alternative is to make it bigger :)

Yeah, but we're already talking about an edge case here. If your mesh is this busy, you'll likely have congestion issues already, which may be the cause of seeing "stale" packets due to high-priority packets coming in between.

Increasing the Tx queue size and adding a node-local counter doesn't come for free.

We can keep meshtastic_MeshPacket the same size by keeping the count local to the queue. The TX queue size is very modest.

Meshtastic is growing (and want it to) so is it really an edge case if we are already seeing it on a daily basis?

GUVWAF · 2024-08-27T19:05:22Z

We can keep meshtastic_MeshPacket the same size by keeping the count local to the queue. The TX queue size is very modest.

I won't discourage you from implementing a cost-effective solution of course ;)

Meshtastic is growing (and want it to) so is it really an edge case if we are already seeing it on a daily basis?

Meshtastic itself is, but there is just a limit to how much traffic a mesh can cope with. Switching to a faster modem preset is likely a better solution overall.
However, are you sure what you're seeing on a daily basis is caused by this specific issue? Isn't the priority causing the stale packets? Before this PR, the IDs were truly random. Busy meshes also have extra long delays due to high channel utilization, experience collisions more regularly causing retransmissions, etc.

gitbisector · 2024-08-27T19:50:28Z

I can further instrument the priority queue to provide histograms. From our back-and-forth here, I am walking away that using rx_time fixes a real bug where large packet ids get penalized. Besides that bug, there is also the issue that device/apps can easily fill up tx and figuring out a clean mechanism to better share the bandwidth might help with scaling too.

* Mask out random bits when doing queue ordering * Parenthesis

GUVWAF · 2024-08-30T19:00:28Z

@gitbisector This might help as well if you were mainly seeing stale text messages: #4592

Mask out random bits when doing queue ordering

18a3816

jp-bennett requested review from thebentern and GUVWAF August 26, 2024 17:43

gitbisector reviewed Aug 26, 2024

View reviewed changes

Parenthesis

2c2e8b0

GUVWAF approved these changes Aug 26, 2024

View reviewed changes

thebentern approved these changes Aug 26, 2024

View reviewed changes

thebentern merged commit b9a8683 into master Aug 26, 2024
103 checks passed

thebentern deleted the mask-random branch August 26, 2024 20:48

thebentern mentioned this pull request Aug 26, 2024

[Bug]: Randomization in generatePacketId() breaks TX PacketQueue priority sorting #4560

Closed

geeksville pushed a commit to geeksville/Meshtastic-esp32 that referenced this pull request Aug 28, 2024

Mask out random bits when doing queue ordering (meshtastic#4561)

3c4d964

* Mask out random bits when doing queue ordering * Parenthesis

GUVWAF mentioned this pull request Sep 2, 2024

Same priority packets processed in enqueue order #4608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mask out random bits when doing queue ordering #4561

Mask out random bits when doing queue ordering #4561

jp-bennett commented Aug 26, 2024

jp-bennett commented Aug 26, 2024

gitbisector left a comment

jp-bennett commented Aug 26, 2024

gitbisector commented Aug 26, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 27, 2024 •

edited

Loading

gitbisector commented Aug 27, 2024 •

edited

Loading

GUVWAF commented Aug 27, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 27, 2024 •

edited

Loading

gitbisector commented Aug 27, 2024 •

edited

Loading

GUVWAF commented Aug 27, 2024 •

edited

Loading

gitbisector commented Aug 27, 2024

gitbisector commented Aug 27, 2024 •

edited

Loading

GUVWAF commented Aug 27, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 27, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 30, 2024

Mask out random bits when doing queue ordering #4561

Mask out random bits when doing queue ordering #4561

Conversation

jp-bennett commented Aug 26, 2024

jp-bennett commented Aug 26, 2024

gitbisector left a comment

Choose a reason for hiding this comment

jp-bennett commented Aug 26, 2024

gitbisector commented Aug 26, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 27, 2024 • edited Loading

gitbisector commented Aug 27, 2024 • edited Loading

GUVWAF commented Aug 27, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 27, 2024 • edited Loading

gitbisector commented Aug 27, 2024 • edited Loading

GUVWAF commented Aug 27, 2024 • edited Loading

gitbisector commented Aug 27, 2024

gitbisector commented Aug 27, 2024 • edited Loading

GUVWAF commented Aug 27, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 27, 2024

gitbisector commented Aug 27, 2024

GUVWAF commented Aug 30, 2024

GUVWAF commented Aug 27, 2024 •

edited

Loading

gitbisector commented Aug 27, 2024 •

edited

Loading

GUVWAF commented Aug 27, 2024 •

edited

Loading

gitbisector commented Aug 27, 2024 •

edited

Loading

GUVWAF commented Aug 27, 2024 •

edited

Loading

gitbisector commented Aug 27, 2024 •

edited

Loading