-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask out random bits when doing queue ordering #4561
Conversation
Fixes #4560 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using parenthesis #define ID_COUNTER_MASK (UINT32_MAX >> 22) is safer.
That's fair. |
The other thing I wonder about is packet_ids coming from other nodes. Are they mixed with locally generated ones? That's not specific to this MSB randomization but may still cause trouble? Is there no access to the RX timestamp here? Saw code where locally generated packets get a timestamp put on. |
@thebentern sorry to ping but while this is still fresh. 'rx_time' field has your name on it. As pkt->id is generated on each transmitting node using it to sort when priorities are equal is not fair. Propose using rx_time instead. Locally created packets get rx_time asssigned too in router.cpp:allocForSending()
|
I'm not sure if that's a good idea. Packets you receive for rebroadcasting might have been generated much earlier than you received it. If you wait too long for rebroadcasting, another node that is not in an advantageous position might jump in. Currently it's not perfect either, because it will be random, but I'm not sure whether there is a better solution. However, having multiple packets with the same priority in the Tx queue is already not a common situation, unless your mesh is very busy. |
If only it was truly random. Trouble with the current approach is that in a busy mesh, unlucky packets with a 'bad' id can sit in a node almost forever as the tx_queue barely ever goes empty. When they finally come out, some packets might even get seen as 'new' rather than duplicate as they have been forgotten. While not perfect (we don't have complete information), using rx_time does improve on this. |
@gitbisector If we want to optimize this even further, I still think packets not generated locally should have priority for the following reasons:
|
Agree, am not at all advocating local packets should get priority. As it stands, sorting within a priority based on ids generated by different sources is causing bias towards low packet ids putting a huge tail latency on packets with large ids. In busy meshes (at least ours) that is making for extra stale packets were we even laugh about how stale they are. This applies to router/repeater nodes too as even they spend more time listening than transmitting keeping the queue from emptying. |
Okay, but then there should be a different strategy than looking at the Problem is also that the |
Didn't realize time can stand still. For same priority sorting we do need a node-local incrementing count. Can be a static inside of meskpacketqueue that increments in enqueue() but need a spot in meshtastic_MeshPacket to put it. (Sorting on all equal time within priority might still work but would need to experiment to verify) |
True, but even if it doesn't occupy it fully, it does influence in which order they are sent out, with the consequences I laid out earlier. The input from the device or client itself is asynchronous to the LoRa radio, and if you transmit new packets every time, you even delay others from rebroadcasting.
I'm not sure all client apps currently properly handle the case where the Tx queue is full, I would not be confident by given them a smaller part. |
True, client/device can easily occupy all (16) queue slots. How we sort items does not change that. Limiting sources to only a fraction of the maximum queue space could help with that. For mesh scaling purposes the app/device should get a small part. |
Alternative is to make the queue maxlen bigger and keep the size for client apps the same :) |
Yeah, but we're already talking about an edge case here. If your mesh is this busy, you'll likely have congestion issues already, which may be the cause of seeing "stale" packets due to high-priority packets coming in between. Increasing the Tx queue size and adding a node-local counter doesn't come for free. |
We can keep meshtastic_MeshPacket the same size by keeping the count local to the queue. The TX queue size is very modest. Meshtastic is growing (and want it to) so is it really an edge case if we are already seeing it on a daily basis? |
I won't discourage you from implementing a cost-effective solution of course ;)
Meshtastic itself is, but there is just a limit to how much traffic a mesh can cope with. Switching to a faster modem preset is likely a better solution overall. |
I can further instrument the priority queue to provide histograms. From our back-and-forth here, I am walking away that using rx_time fixes a real bug where large packet ids get penalized. Besides that bug, there is also the issue that device/apps can easily fill up tx and figuring out a clean mechanism to better share the bandwidth might help with scaling too. |
* Mask out random bits when doing queue ordering * Parenthesis
@gitbisector This might help as well if you were mainly seeing stale text messages: #4592 |
Unintended consequence of randomizing top bits of Packet IDs was that we do use those values to determine packet order in the queue. Simple fix, to mask out the random bits when doing that comparison.