Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probabilistic Forwarding - Using Heuristic Analysis of Network Conditions to Reduce Load #5629

Closed
wants to merge 15 commits into from

Conversation

medentem
Copy link
Contributor

Nodes are continuously logging information about the state of the mesh around them which can be used to create a probabilistic forwarding scheme that mitigates unneeded and unwanted packet traffic without impacting reliability.

There are three core data points under study:

  1. The number of direct neighbors observed in the last X minutes
  2. The number of nodes that have sent the same packet
  3. The number of packets received in a given look-back period

These three data points can be used to determine the likelihood repeating the packet is unnecessary. Of course it can never be a certainty which is why a probabilistic forwarding scheme is used, and the degree to which each factor influences the forwarding probability can be tuned for typical meshtastic levels of traffic.

In any case, even if the influence of these factors are reduced significantly to only conservatively reduce traffic, it would be a traffic reduction nonetheless.

To model the effect of this probablistic forwarder, you can use this js fiddle which mirrors the calculations in FloodRouter.cpp - https://jsfiddle.net/1ufmhry6/4/

Key lines of code:

Line 61-63 of FloodingRouter.cpp -> calls probability calculation function and tests value against a random number to determine if packet will be forwarded.

Line 110 of FloodingRouter.cpp -> calculates probability based on data noted above

@garthvh garthvh requested review from GUVWAF and thebentern December 21, 2024 02:01
@GUVWAF
Copy link
Member

GUVWAF commented Dec 21, 2024

Thanks for this, it’s interesting. While I like that it’s using existing metrics obtained from the mesh without utilizing more airtime or RAM, I’m not sure about the metrics and the idea in general.

First of all, non-routers/repeaters (except non-rebroadcasters like CLIENT_MUTE) will always try to rebroadcast after a small delay based on SNR. If within that window, they hear a packet starting and it appears to be the packet they’re trying to rebroadcast, they will cancel this rebroadcast:

// cancel rebroadcast of this message *if* there was already one, unless we're a router/repeater!
if (Router::cancelSending(p->from, p->id))

So this already drastically limits the number of rebroadcasts. Based on your distinctSources metric, it looks like you didn’t take this into account, because this logic actually is similar but then for more than 1 distinct source, the probability becomes 0.
In your case, you also base the probability on historical metrics, which might cause it to not try rebroadcasting at all even if you’re the only one that can serve a certain set of receivers.

Now onto the metrics.
First, why would having more neighbors decrease the chance of rebroadcasting? Depending on where the packet came from, if you have more nodes to serve than another node, shouldn’t the chance of rebroadcasting be higher for you? For example here, for a packet originating from either 0 or 5, it’s better that node 1 is rebroadcasting, and not node 5 or 0.
image

Next, the distinctSources metric won’t work for several reasons. First, you determine based on the first time you receive a packet whether you will put it in the transmit queue, at which point this will always be 1. Subsequent packets will not enter perhapsRebroadcast() because of the wasSeenRecently() check. Furthermore, the sender in recentPackets is the from of a packet, which does not change when a different node rebroadcasts. It’s the original transmitter of a packet -- we don’t know who is rebroadcasting.

Lastly, while the recentUniquePacketRate will decrease channel utilization in case the mesh has more traffic to transfer, in my opinion this is not something that should influence the probability of rebroadcasting.
There is already throttling in place at high channel utilization for periodic broadcasts like DeviceTelemetry (and the interval scales with the number of nodes in the mesh). Furthermore, the delay before originating a packet scales with channel utilization in order to limit the chance of collisions. However, in case an event happens and everybody starts transmitting text messages, this is not a reason to lower the chance of rebroadcasting. If you were a good rebroadcaster, it doesn’t make you a worse rebroadcaster in case there are more packets to be delivered.
Also, this metric does not take the actual airtime into account. Using modem preset SHORT_TURBO you can tolerate much more packets than on LONG_SLOW.

@medentem
Copy link
Contributor Author

Thank you for such thorough feedback!

First of all, non-routers/repeaters (except non-rebroadcasters like CLIENT_MUTE) will always try to rebroadcast after a small delay based on SNR. If within that window, they hear a packet starting and it appears to be the packet they’re trying to rebroadcast, they will cancel this rebroadcast:

// cancel rebroadcast of this message *if* there was already one, unless we're a router/repeater!
if (Router::cancelSending(p->from, p->id))

So this already drastically limits the number of rebroadcasts. Based on your distinctSources metric, it looks like you didn’t take this into account, because this logic actually is similar but then for more than 1 distinct source, the probability becomes 0. In your case, you also base the probability on historical metrics, which might cause it to not try rebroadcasting at all even if you’re the only one that can serve a certain set of receivers.

Good point. I see that now. But I don't follow this part -> "for more than 1 distinct source, the probability becomes 0"
Psuedo code here

    float REDUNDANCY_INFLUENCE_FACTOR = 0.05f;
    int distinctSources = getDistinctSourcesCount(p->id);
    float redundancyFactor = 1.0f / (1.0f + distinctSources * REDUNDANCY_INFLUENCE_FACTOR);

    redundancyFactor = 1 / (1 + 1 * .05) = .952
    float probability = 1 * .952 = .952

To your point though, it seems like you're deduplicating using a somewhat similar mechanism (ie. random broadcast delay in the packet pool). Probably not a good metric to utilize.

Next, the distinctSources metric won’t work for several reasons. First, you determine based on the first time you receive a packet whether you will put it in the transmit queue, at which point this will always be 1. Subsequent packets will not enter perhapsRebroadcast() because of the wasSeenRecently() check. Furthermore, the sender in recentPackets is the from of a packet, which does not change when a different node rebroadcasts. It’s the original transmitter of a packet -- we don’t know who is rebroadcasting.

Yes... you're right. We'd have to add something to the packet to indicate the original from vs. the last repeating node. Wouldn't this be useful though? If we had that information, any node could more effectively understand how packets were traversing the mesh.

Now onto the metrics. First, why would having more neighbors decrease the chance of rebroadcasting? Depending on where the packet came from, if you have more nodes to serve than another node, shouldn’t the chance of rebroadcasting be higher for you? For example here, for a packet originating from either 0 or 5, it’s better that node 1 is rebroadcasting, and not node 5 or 0.

I think it depends actually. The number of neighbors may mean this node should broadcast more in the case that this node is the only node that is capable of serving the other nearby nodes. But it could also mean that this node is one of many in a dense mesh where everyone can essentially serve everyone and therefore you'd want to lower its propensity to rebroadcast.

What about something like a bloom filter, where the top N strongest immediate neighbors are added to the filter and passed along in the packet. When the next node receives the packet, it compares its own top N strongest immediate neighbors and the number of likely unique nodes it serves is an input to the probabilistic forwarding computation.

Let's assume we'd use 2 hashes per node (k = 2), a 64 bit total field size (m = 64) and a total of 20 entries max (n = 21), you'd end up with a false positive rate of ~30%. With 21 entries on a 3 hop configuration, you'd be able to record each nodes top 7 neighbors. And that assumes each is holding fully unique neighbor lists.

@GUVWAF
Copy link
Member

GUVWAF commented Dec 22, 2024

But I don't follow this part -> "for more than 1 distinct source, the probability becomes 0"

With "this logic" I was referring to how it currently works in Meshtastic.

(ie. random broadcast delay in the packet pool). Probably not a good metric to utilize.

The random delay is taking from a contention window, which scales with SNR (after giving router/repeaters priority). Meaning that nodes further away will generally rebroadcast first, in order to minimize the amount of hops used to spread a packet.

We'd have to add something to the packet to indicate the original from vs. the last repeating node. Wouldn't this be useful though?

I propose to add this for the Next-Hop Router (#2856), although only the last byte of the relayer's node number, because we only have 2 bytes left in the header.

where the top N strongest immediate neighbors are added to the filter and passed along in the packet.

When you consider to add any overhead, in my opinion we should first simulate (https://github.com/meshtastic/Meshtasticator) whether it gives significant improvements above the current method in all kinds of scenarios.

@medentem
Copy link
Contributor Author

Very interesting. I'll review that PR. And I'll also utilize the simulator. Thank you.

@medentem medentem closed this Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants