-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Please Enable More Frequent Device Metrics/Telemetry for Router Role Devices ONLY in Crowded Mesh #4466
Comments
My recommendation a few months ago on Discord was to have ROUTERS send telemetry (including an environmental packet) every 6 hours even when channel utilization is high. |
Even at high channel utilization, you can still request the telemetry with the Python CLI command Currently |
@Deuteranomalous1 could you post a config dump of one of these routers (without sensitive information)? |
Probably it's then just the high channel utilization that's causing it to not send it out:
|
Good point. We might consider adding ROUTER role to that exclusion since it is more important to know if routers are healthy than client nodes on the mesh. |
Yeah, I think that's fair. Then it will send it still up to 40% instead of 25% channel utilization. Just don't document it as a feature for the |
Sorry its a 5-6 hour round trip hike this time of year but I can get it for you next trip if you still need it but it seems like its well in hand :) Thanks guys! |
Routers are now considered "impolite", meaning they will send their DeviceTelemetry up until 40% channel utilization instead of 25%:
You can also request DeviceTelemetry using the Python CLI |
Yes I am aware that there is a manual work-around method using the CLI as you covered that in this convo back August, thank you for mentioning that back then as it was useful to me in a pinch. 👍 However, the vast majority of users will interact with their nodes and do quick health-checks via mobile Apps and that alternative method does not help those majority of users there.
Two points from this I guess:
|
|
This is one of the first things we tried and had no joy with, it seemingly made no difference. Is that setting adhered to the same when a device is using the Router role? If the remedy was that easy to change that setting then why was that not proposed to the original author on this issue? My prior testing suggested it was having no effect when changed, which is what led me to going down this path of finding other hidden attributes discussed in here (channel util %) which can get in the way and block this. If the device metric broadcast for Routers was permitted once every 12 hours irrespective of channel util % then this would all be a mute point and people would be able to easily get the data they need. I'm struggling to see the rationale behind that limitation. |
I believe so, I can't find in the code where it wouldn't respect this for routers.
The original issue was specifically about the high channel utilization. Back then it was still 25% and the interval is scaled with the number of online nodes. Now, for routers it's 40% and not scaled with the number of online nodes.
So why would relaxing this constraint help for you if your channel utilization is very low? |
Because the broadcast for the Router role has been demoted to be very infrequent (12hrs) and I think sometimes our channel utilization may range around 20-35%. And as I've been trying to state from testing I think lower than 40% thresholds have still been imposing a block potentially. I have LoS to a Router 1 mile away which I can communicate with flawlessly, remote admin etc. but I can go many days with no Device Metrics being retrieved. I can on-demand, manually use the CLI on my PC just to retrieve them reliably, so the link is good. My understanding of channel utilisation from a official Meshtastic video led me to believe that non-Meshtastic traffic on the band (other LoRaWAN / Helium traffic) can also contribute to the channel utilisation figure, so I believe that some of what we're seeing locally is not Meshtastic traffic but other stuff on the air and due to Meshtastic's code logic choices its blocking informational broadcasts in a very quiet Meshtastic zone. |
Are you always connected with an app to a Meshtastic device? A node can only store around 30 packets of all types, and text messages get priority when the buffer is full.
This is highly unlikely if not impossible. They have to be using the exact same frequency and LoRa settings, as well as our custom sync word. |
Yes I have tried leaving it connected for days.
Does that mean there is a mistake in the first 35 seconds of this official Meshtastic video: https://www.youtube.com/watch?v=aDud_5Bxtvc or perhaps I have misinterpreted the opening statements? |
Just to reiterate - I can at any point change this node from Router to Client and start getting Device Metrics reliably every hour, as soon as its switched back to Router it stops, no updates every 12hrs, even if you're patient and wait 48-72hrs. |
Yes, while it's theoretically possible, I don't think there is currently any other system that uses the exact same LoRa settings and sync word as we do.
Strange, I don't see how this can occur. Can you try setting it to 30 min.? |
i have a router running firmware from the last month (cant remember which one exactly) and have it set to telemetry every 30min. In my busy mesh i normally end up getting telemetry every 1hr approx. Just set to to 30min (1800) and it will work as expected. The defaults have changed, but the minimum for us (you) that know what we're doing it still very good. |
Agreed this is what I ended up doing as well. FWIW requesting telemetry over CLI does not provide the whole picture. Just the basic telemetry built in to the node. At least not for me: PS C:\Users\user> py -m meshtastic --request-telemetry --dest '!6e2718a' This node is equipped with an INA219 for voltage/current and a RAK 1901 for Temp/Humidity. The telemetry for those items sends just fine with the regular interval updates but apparently can not be requested over CLI. Not complaining, just seems worth noting in case someone looks this up later. |
I believe you can if you update the CLI and use |
Hot diggidy thank you sir! Worked like a charm. |
OK, thanks for confirming that video had some false-statements.
Certainly, I'd be happy to try new suggestions. I have tried this in the past but happy to try again, just for clarity as I know Android / iOS lay things out differently, on my iOS App I'm going to: After puttning that setting in place via Remote Admin I confirmed it was successful by querying via the CLI on another device and got back:
I have also cleared the log on my mobile app for the node so I can monitor what comes in over time. My belief is that the isImpoliteRole logic block which only exists in DeviceTelemetry.cpp is getting triggered too easily (possibly below 40%) and this is causing issues, if someone more capable than myself who can interpret / understand the code is able to double-check the logic in that code block it would be greatly appreciated. IIRC the video said the Channel Util was measured every minute, is that correct or false as well? If it is a % being measured over the period specified by the update interval, then a longer period update interval (ie. 12hrs) would gather a higher % over the time? I realise its probably not this last bit. |
I double-checked the logic and I don't see anything wrong.
This is correct. |
Thank you for confirming those two things @GUVWAF 👍 After switching the Device Metrics to thirty minutes earlier this morning and verifying it via the CLI, so far today I've received just one metric broadcast at 12:00pm. This is to that LoS Router 1 mile away with no hops. Stats from that one Device Metric were: Manually running
Are you able to confirm that the logic which changes Router's device metric broadcasts to be every 12 hours does not override and take precedence over the manual configuration of the setting shown in my screenshot above? If you have any further suggestions on things to try then I'm happy help in testing what I can. |
In DeviceTelemetry it calls firmware/src/modules/Telemetry/DeviceTelemetry.cpp Lines 25 to 26 in 02a5a91
And this eventually uses the configured interval if you have set it: Lines 7 to 8 in 02a5a91
So, yes, that seems good to me. Also b8b8 confirmed it's working as expected for his router.
What's the firmware version of your router actually? If it's <2.4.3 the limit would still be at 25% channel utilization, which you seem to exceed sometimes. |
I've just noticed the line beneath that one you highlighted above is: Line 71 in 2c65445
Doesn't that variable sound more relevant to the whole polite / impolite logic block we have been discussing? |
2.5.15 |
If it's a router Lines 123 to 125 in 5196ee3
|
OK, so I'm around 28hrs into my Router being set to a Device Metric Interval of every thirty minutes now and this is what I have so far: As mentioned if I switch it from router to client they will start coming through more regularly and more successfully. Given the results I'm seeing you can appreciate how I began to think: I'm not the only one seeing this either, I have heard the same from another router operator around 10 miles from me. Any thoughts on what else I can try or test @GUVWAF ? Thanks for your time and help. |
Router and repeater send metrics every 12 hours. |
Yes, I learnt that some time ago, however @GUVWAF gave me the impression recently that the Device Metric Update Interval setting would still over-ride that and take precedence even when the role was Router. Are you saying that's not the case? |
You have to make a manual request, the broadcast interval is unchanged |
OK, so there we have it then: the Device Metric Update Interval setting (aka telemetry.device_update_interval) has no effect when the role is ROUTER. We still have the mystery of why some router operators in a very quiet Meshtastic area are not metrics every 12 hours. I'll continue to investigate... |
It does, you can still set it as long as it's at least 30 min. @b8b8 confirmed this is working as expected. I think the only way you can debug this is checking the serial logs of your router.
|
I already showed you the code and now I proved that it's working as expected for me, and b8b8 confirmed this as well. Indeed the default is 12 hours, but you can still set it differently so "the broadcast interval is unchanged" is then not true. |
You're assuming you're talking to someone who can interpret the code, which I can't reliably. Also even what I can understand I don't know how it fits into the bigger picture exactly with the rest of the code, so I don't know the nuances like you folks.
I understand the default for router is 12 hours, I don't know how many times I have to say that. So what do you make of my router + the other router operator in my county who have both set their telemetry.device_update_interval to 1800 but aren't receiving anymore than one or two a day? The above would suggest it is being limited to once every 12 hours as @garthvh said and possibly encountering chan util issues. Do you think the Router needs a reboot after changing the setting? Traditionally I've been used to it rebooting itself if it requires it. I'm just trying to think of whatever else could possibly be causing a disparity between what you're seeing versus what we're seeing. We are in the UK on 868MHz if that makes any difference to logic used in the firmware. |
You were asking for me to confirm the logic. I can of course say "yes, checks out", but then I'd rather also show it. I also tried to reproduce your issue, but I couldn't.
I'm sorry, but at this point I don't know why you're having these issues and I think the only way would be to check the serial logs of your router.
No, shouldn't be necessary and since you got the correct setting back when by querying from the CLI, you should be good. |
@ashleycawley Ah, finally found something. Your own transmit air utilization is quite high, when you requested it, it was 6.20%. Since you are using a region with 10% duty cycle, it won't transmit DeviceTelemetry above 5% transmit utilization to avoid reaching the point where you cannot send anything at all anymore. The reason this happens only in |
Interesting find! I see, so that's the "Transmit air utilization", there's some logic that says not to transmit device metrics if its above 5%? Nice find. Would that be stated in the same airtime.h file we were talking about previously? |
Yes, it's 50% of the duty cycle, so 50% of 10% is 5%: Line 72 in 4c3a3ca
|
I think you've found it @GUVWAF - Over the past 2.5 hours I've been requesting and logging data via the CLI from my Router and look at the Airtime Utilization:
All over 5%, meaning the logic in that code-block would prevent Device Metrics from being published. I've then looked at the other Router owned by someone else 10 miles away and although the device metrics is still vastly lacking in frequency, it transmitted more metrics than my router but all the data has a airtime of 5% or less, not a single one over that mark. |
Platform
NRF52
Description
The improvements to Metrics/Telemetry broadcasting for busy meshes are great at reducing congestion.
However, it's also created a bit of an issue for busy mountain top Routers. Routers are by definition meant to be located high and clear of obstructions. This means they are very likely to see more than 40 nodes and also see the highest channel utilization in any given mesh.
This has created somewhat of an issue after upgrading my main router to 2.4.x. I am now seeing metrics updated maybe once every few days due to the exceptional viewshed it possesses which overlooks the entirety of the south Salish Sea from Vancouver down to Puget Sound. The channel util at the site was perpetually hovering around 50% based on my last on site visit. This has grown substantially since the unit was installed in March when it was reliably around 20%. Since then I have had a steady decline in frequency of updates corresponding with increase in channel utilization but the metrics were received more than often enough to be useful data. Upon upgrading to 2.4.0 a few weeks ago the updates have almost ceased entirely.
Routers mode devices are also remote and the metrics/telemetry are instrumental in preemptively spotting issues such as under charging and moisture ingress. Having multiple days pass between received telemetry information significantly hampers this ability.
My ask is that the frequency scaling for Metrics/Telemetry on Router mode ONLY nodes be altered in a future release. Perhaps an interval of every X hours regardless of number of online nodes and channel utilization? Perhaps 3 hours? In a low traffic mesh the current scaling could still be observed. Or simply allow the user specified interval to be observed if it is set to a value greater than 60 minutes?
Once again, this request is for Router mode nodes ONLY.
I've attached screenshots from my device and another user to illustrate the infrequency of useful telemetry reception.
Thank you!
The text was updated successfully, but these errors were encountered: