Add new ROUTER_LATE role #5528

erayd · 2024-12-08T03:51:33Z

This PR adds a new ROUTER_LATE role.

This role will always rebroadcast packets, but will do so after all other modes. Intended for router nodes that are there to provide additional coverage in areas not already covered by other routers, or to bridge around problematic terrain, but should not be given priority over other routers in order to avoid unnecessaraily consuming hops (basically the "we need this thing to always rebroadcast, but it isn't a 'real' high site" scenario).

By default, this role will rebroadcast during the normal client window. However, if another node is overheard rebroadcasting the packet, then it will be moved to a second window after the normal client one, with the same timing behaviour.

Relevant discussion with @GUVWAF on Discord is here. This feature originally started life as just a configurable delay, and following that discussion has morphed into this new role 🙂.

Please note that this PR requires that these protobuf changes are applied first. I have omitted the protobuf regen commit from this PR, as I'm unsure whether you want it bundled or would prefer to apply those changes separately beforehand.

To assist with testing, a directly buildable version of this based on v2.5.15 with the protobuf regen commit included can be found here.

erayd · 2024-12-08T19:06:26Z

@thebentern Did you intend to merge 33494c3 into this PR, or was that a mistake? It doesn't seem related at all...

fifieldt · 2024-12-08T21:01:08Z

Hi @erayd , don't panic! This is just us keeping your development branch in-sync with upstream master so it can merge more easily. If you have a look at the Files tab, you'll see nothing about your patch has changed.

erayd · 2024-12-08T21:05:33Z

@fifieldt Thanks for the clarification, and that makes sense 🙂. It was just unexpected, so I was wondering if they had accidentally merged into the wrong branch.

GUVWAF · 2024-12-09T11:23:27Z

Looking very good already!

However, I had it slightly differently in my mind. To make sure you are not still sending faster than busy routers/clients, I think we really need to restart the timer with the original delay. For example, when receiving a packet with a long airtime in between (or even when you had already multiple packets to be transmitted in the queue), you might already pass the remaining transmit delay when the packet is in front of the queue, and then you start transmitting directly after the delay from startTransmitTimer(). This is what another router or client then also does, and it depends on the random number drawn who will be faster.

I think it makes sense to change startTransmitTimer() here:

firmware/src/mesh/RadioLibInterface.cpp

Line 262 in f81d3b0

startTransmitTimer();

and here:

firmware/src/mesh/RadioLibInterface.cpp

Line 267 in f81d3b0

startTransmitTimer();

to setTransmitDelay(), for all roles. This is actually more correct, since it will determine the delay based on SNR (if available), instead of channel utilization. It won’t affect current firmware much, because the delay range is still the same.

Then, with this, you can just check whether the packet in front of the queue has the Delayed tag set, and then apply the additional delay, without needing the tx_after field.

The downside is that sometimes indeed you might start it over and that creates additional latency, but I think that's justified, because this usually only happens when the mesh is busy.

src/mesh/RadioLibInterface.cpp

erayd · 2024-12-09T11:40:43Z

@GUVWAF

Then, with this, you can just check whether the packet in front of the queue has the Delayed tag set, and then apply the additional delay, without needing the tx_after field.

The downside is that sometimes indeed you might start it over and that creates additional latency, but I think that's justified, because this usually only happens when the mesh is busy.

The problem with doing this is when multiple of these are involved that can overhear each other, the latency added can be quite considerable. The problem then becomes DoS via excessive delay.

Would resuming with max(0, remaining tx_after) + regular SNR-based delay be an acceptable compromise? That way your desired keep-adding-slight-delay behaviour is there, but the extra portion of the delay is only added once.

GUVWAF · 2024-12-09T12:12:17Z

Would resuming with max(0, remaining tx_after) + regular SNR-based delay be an acceptable compromise?

Sounds good, considering this is an edge case for not so busy meshes anyway. Although a DoS is a big word, because it's already an additional service compared to a normal client.

So if you're using the new tx_after field, I don't think the Delayed tag is necessary anymore?

erayd · 2024-12-09T12:21:17Z

Sounds good, considering this is an edge case for not so busy meshes anyway.

Sweet, will implement accordingly.

Although a DoS is a big word, because it's already an additional service compared to a normal client.

DoS seems applicable here, because these things fill an essential role in routing around terrain. They need to not get delayed so long as to be rendered useless (e.g. to cause remote admin packets to time out etc).

The reason they are in the late window isn't because they are unimportant, but because they have rather poor coverage characteristics, and should give better placed sites a shot at rebroadcasting first.

So if you're using the new tx_after field, I don't think the Delayed tag is necessary anymore?

I'm using both, for different purposes. tx_after to keep track of the delay amount, and delayed to ensure that the extra delay only gets added once (instead of once per dupe heard).

GUVWAF · 2024-12-09T13:04:50Z

They need to not get delayed so long as to be rendered useless (e.g. to cause remote admin packets to time out etc).

True, but this risk is still there, because it only gets the minimum priority after adding the delay, so if someone is sending multiple packets after each other, the newer packet gets inserted before one that already is being delayed. (So perhaps a re-ordering is also needed after setting the priority to minimum?)

delayed to ensure that the extra delay only gets added once (instead of once per dupe heard).

Since the tx_after is only really needed if you add this additional delay, it could be applied only there and a non-zero value for tx_after would indicate it's already being delayed.

erayd · 2024-12-09T13:10:18Z

True, but this risk is still there, because it only gets the minimum priority after adding the delay, so if someone is sending multiple packets after each other, the newer packet gets inserted before one that already is being delayed. (So perhaps a re-ordering is also needed after setting the priority to minimum?)

Good point. Yes.

Since the tx_after is only really needed if you add this additional delay, it could be applied only there and a non-zero value for tx_after would indicate it's already being delayed.

That works 🙂

erayd · 2024-12-10T03:41:38Z

@GUVWAF Have made a bunch of changes to the timing & the queuing following our discussion above. Is this now more in line with what you were expecting?

The queuing changes are to ensure that late packets go to the back of the queue, but can still have their priority respected within that domain.

GUVWAF · 2024-12-10T18:23:17Z

The current approach will not actually use the additional delay after transmitting or receiving another packet. It only applies this when the channel is active, but it’s likely that this will result in an Rx interrupt, which will overwrite the delay, and call setTransmitDelay() afterwards.

What if you just always use setTransmitDelay()? When receiving a duplicate packet, if tx_after is not yet set, set it only to millis() + getTxDelayMsecWeightedWorst(p->rx_snr). Then in setTransmitDelay(), set the notification for max(tx_after + getTxDelayMsecWeighted(p->rx_snr), millis() + getTxDelayMsecWeighted(p->rx_snr)) - millis(). Checking for tx_after > millis() within TX_DELAY_COMPLETED is then also not necessary.

The reason I am so picky about this is that if we want this to be a general available role, we should really make sure it does not actually transmit faster than routers/clients, and it shouldn’t increase the chance of collisions too much (so also ensure it’s really outside of the window for regular routers/clients).

src/mesh/RadioLibInterface.cpp

erayd · 2024-12-11T01:39:52Z

The current approach will not actually use the additional delay after transmitting or receiving another packet. It only applies this when the channel is active, but it’s likely that this will result in an Rx interrupt, which will overwrite the delay, and call setTransmitDelay() afterwards.

Thanks - good catch 🙂

What if you just always use setTransmitDelay()? When receiving a duplicate packet, if tx_after is not yet set, set it only to millis() + getTxDelayMsecWeightedWorst(p->rx_snr). Then in setTransmitDelay(), set the notification for max(tx_after + getTxDelayMsecWeighted(p->rx_snr), millis() + getTxDelayMsecWeighted(p->rx_snr)) - millis().

I like this approach. Yes.

Checking for tx_after > millis() within TX_DELAY_COMPLETED is then also not necessary.

This still seems to be necessary. During testing, TX_DELAY_COMPLETED often seems to fire when the requested delay has not yet expired (maybe due to interrupts?), so i think the check is important unless we can find and prevent whatever is causing it to return early.

The reason I am so picky about this is that if we want this to be a general available role, we should really make sure it does not actually transmit faster than routers/clients, and it shouldn’t increase the chance of collisions too much (so also ensure it’s really outside of the window for regular routers/clients).

Picky is good (and I agree with your reasoning). It's important that this be robust 🙂

erayd · 2024-12-11T10:04:01Z

If you're wondering where the commit that implements the above changes is... it's coming, but there is an odd bug I need to fix first. Will push it once I have nailed this down.

The Bug:
For some reason, tx_after is arriving at setTransmitDelay with an initial value of precisely 52717827, for every packet, even ones that were not initially delayed, and regardless of whether I explicitly initialise that field to zero earlier (and even though the entire packet object is allocated zero-filled anyway). I currently have no idea where this value is originating.

GUVWAF · 2024-12-11T19:47:53Z

Hmm, I'm assuming you applied all the protobuf changes on top of the latest release, right? If the tx_after would be at the same place where e.g. now pki_encrypted is and the app sets this, this might cause incompatibility.

I don't see how this could happen otherwise.

erayd · 2024-12-11T20:07:15Z

Hmm, I'm assuming you applied all the protobuf changes on top of the latest release, right? If the tx_after would be at the same place where e.g. now pki_encrypted is and the app sets this, this might cause incompatibility.

Pretty sure I have. It compiles properly with code that refers to those fields, which IMO is an indication that the changes are there... I'd expect to see build errors otherwise. With that said, it seems like a good place for me to look. I might try a full clean & rebuild, and see what that does - maybe there's a bit of the old version lurking around somewhere that shouldn't be.

I don't see how this could happen otherwise.

Yeah, it's confusing as heck. The data is clearly originating somewhere, and given it's not getting there via assignment to tx_after, that does point to something else walking all over its memory space. I'll pursue your suggestion above about maybe something missing the protobuf update - as you say, that does seem like the most likely mechanism.

GUVWAF · 2024-12-15T16:10:10Z

Have you been able to figure it out? At least with the current state of your router-infill branch, I cannot reproduce it.
I did see a crash, however, because setTransmitDelay() should now also check whether getFront() doesn't return NULL.

erayd · 2024-12-15T17:20:05Z

Have you been able to figure it out? At least with the current state of your router-infill branch, I cannot reproduce it.

I've been extremely short on time the last few days, so haven't yet had the opportunity to dig into it. Hoping to do so either this evening or tomorrow.

I did see a crash, however, because setTransmitDelay() should now also check whether getFront() doesn't return NULL.

Thanks - will add a check for this 🙂

Kealper · 2024-12-17T04:58:18Z

Definitely interested in this new role, it would help out in the region I'm in that's a mix of mountains and hills. Hills are the easy part but so far there's been some issues trying to get packets to reliably get around mountains that aren't able to have regular Router nodes on them.

erayd · 2024-12-21T04:58:23Z

@GUVWAF What do you reckon now?

The memory stomp appears resolved following full protobuf rebuild & adding that null check after getFront().

GUVWAF · 2024-12-21T08:50:44Z

Good to hear! It looks good to me now, thanks for your cooperation.

Let's wait for more approvals on the protobuf definitions.

erayd · 2024-12-21T08:56:07Z

Sounds good. Thanks very much for your input and assistance with this feature 😁

erayd · 2024-12-21T10:58:39Z

Have squashed the commits to make it easier to wrangle.

fifieldt · 2024-12-24T00:44:15Z

Protobufs appear merged

erayd · 2024-12-24T00:55:19Z

@fifieldt

Protobufs appear merged

They're merged in the protobuf repo master branch. Not yet in the firmware repo though.

Do you want the protobuf update in this repo to happen separately, or should I add a commit for that here (bumping the submodule commit & regen update) as part of this PR?

fifieldt · 2024-12-24T00:56:21Z

Ah! let me trigger that job, moment.

fifieldt · 2024-12-24T00:58:15Z

( #5658 )

Will always rebroadcast packets, but will do so after all other modes. Intended for router nodes that are there to provide additional coverage in areas not already covered by other routers, or to bridge around problematic terrain, but should not be given priority over other routers in order to avoid unnecessaraily consuming hops. By default, this role will rebroadcast during the normal client window. However, if another node is overheard rebroadcasting the packet, then it will be moved to a second window *after* the normal client one, with the same timing behaviour.

erayd mentioned this pull request Dec 8, 2024

Add the ROUTER_LATE role & supporting fields meshtastic/protobufs#632

Merged

erayd mentioned this pull request Dec 9, 2024

If a packet is heard multiple times, rebroadcast using the highest hop limit #5534

Draft

GUVWAF reviewed Dec 9, 2024

View reviewed changes

src/mesh/RadioLibInterface.cpp Outdated Show resolved Hide resolved

erayd changed the title ~~Add new ROUTER_INFILL role~~ Add new ROUTER_LATE role Dec 9, 2024

GUVWAF reviewed Dec 10, 2024

View reviewed changes

src/mesh/RadioLibInterface.cpp Outdated Show resolved Hide resolved

GUVWAF reviewed Dec 10, 2024

View reviewed changes

src/mesh/RadioLibInterface.cpp Outdated Show resolved Hide resolved

GUVWAF mentioned this pull request Dec 18, 2024

[Feature Request]: Role between CLIENT and CLIENT_MUTE #5603

Closed

erayd marked this pull request as ready for review December 21, 2024 09:00

erayd marked this pull request as draft December 21, 2024 09:02

erayd marked this pull request as ready for review December 21, 2024 09:03

erayd force-pushed the router-infill-master branch from 8de1024 to bbc2a0e Compare December 21, 2024 10:55

erayd requested a review from GUVWAF December 21, 2024 21:43

fifieldt force-pushed the router-infill-master branch from bbc2a0e to 7dc330d Compare December 24, 2024 00:45

fifieldt force-pushed the router-infill-master branch from 7dc330d to f92c5e4 Compare December 26, 2024 02:43

GUVWAF approved these changes Dec 26, 2024

View reviewed changes

GUVWAF requested review from caveman99, fifieldt and thebentern December 26, 2024 08:48

fifieldt approved these changes Dec 26, 2024

View reviewed changes

fifieldt force-pushed the router-infill-master branch from f92c5e4 to 1a7adf9 Compare December 26, 2024 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new ROUTER_LATE role #5528

Add new ROUTER_LATE role #5528

erayd commented Dec 8, 2024 •

edited

Loading

erayd commented Dec 8, 2024

fifieldt commented Dec 8, 2024

erayd commented Dec 8, 2024

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024 •

edited

Loading

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024

erayd commented Dec 10, 2024

GUVWAF commented Dec 10, 2024

erayd commented Dec 11, 2024

erayd commented Dec 11, 2024 •

edited

Loading

GUVWAF commented Dec 11, 2024

erayd commented Dec 11, 2024

GUVWAF commented Dec 15, 2024 •

edited

Loading

erayd commented Dec 15, 2024

Kealper commented Dec 17, 2024

erayd commented Dec 21, 2024 •

edited

Loading

GUVWAF commented Dec 21, 2024

erayd commented Dec 21, 2024

erayd commented Dec 21, 2024

fifieldt commented Dec 24, 2024

erayd commented Dec 24, 2024 •

edited

Loading

fifieldt commented Dec 24, 2024

fifieldt commented Dec 24, 2024

Add new ROUTER_LATE role #5528

Are you sure you want to change the base?

Add new ROUTER_LATE role #5528

Conversation

erayd commented Dec 8, 2024 • edited Loading

erayd commented Dec 8, 2024

fifieldt commented Dec 8, 2024

erayd commented Dec 8, 2024

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024 • edited Loading

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024

erayd commented Dec 10, 2024

GUVWAF commented Dec 10, 2024

erayd commented Dec 11, 2024

erayd commented Dec 11, 2024 • edited Loading

GUVWAF commented Dec 11, 2024

erayd commented Dec 11, 2024

GUVWAF commented Dec 15, 2024 • edited Loading

erayd commented Dec 15, 2024

Kealper commented Dec 17, 2024

erayd commented Dec 21, 2024 • edited Loading

GUVWAF commented Dec 21, 2024

erayd commented Dec 21, 2024

erayd commented Dec 21, 2024

fifieldt commented Dec 24, 2024

erayd commented Dec 24, 2024 • edited Loading

fifieldt commented Dec 24, 2024

fifieldt commented Dec 24, 2024

erayd commented Dec 8, 2024 •

edited

Loading

erayd commented Dec 9, 2024 •

edited

Loading

erayd commented Dec 11, 2024 •

edited

Loading

GUVWAF commented Dec 15, 2024 •

edited

Loading

erayd commented Dec 21, 2024 •

edited

Loading

erayd commented Dec 24, 2024 •

edited

Loading