A screenshot of the summary section of the Grafana dashboard with 2 devices reporting
A flowchart of systems used in a NetCheck setup
NetCheck is a persistent network uptime and speed-logging Grafana exporter intended for use on multiple Raspberry Pis. The intended audience of this software is IT professionals with at least a little experience with setting up and debugging CLI-based software.
API forked from MiguelNdeCarvalho/speedtest-exporter
This software logs network information from several devices in different physical locations to help troubleshoot AP/switch-level outages. It's a one-time setup that does not integrate with any existing IT systems.
In simple terms: Load this software onto several devices (1-3x Raspberry Pis are optimal) and leave them around the building. Say for example, one in the office, one in the reception and one in a room where someone has complained about the slow internet. Then each of them will run: speed, latency and packet loss tests to monitor how the health of your network as seen by endpoint users who have connected to an AP. This helps to diagnose troublesome areas, faulty equiptment or possible network conjestion via AP bandwidth limiting.
Key Features:
- Interactive network monitoring with Grafana
- Metric collection with isolation from existing infrastructure
- Hassle-Free Deployment
- Find issues in your network like how all end users inevitably do
- Free cloud-based metric storage and retrieval
This software was originally commissioned by an organisation with hundreds of access points to maintain, but no specialised ping plotting systems, no access to networking configuration and no on-site technicians to diagnose or confirm troublesome areas. The building occupants in this organisation's premises would occasionally have reports of intermittent WiFi drops and latency issues, but a technician was not always available to manually confirm these issues. And when they were, the issue would sometimes take long periods to occur. These issues are occasionally things like faulty APs, bandwidth caps, etc. However, some issues would resolve themselves over time or were issues with the user's device and not the IT infrastructure.
Therefore, an optimal solution to help the IT staff was to leave a device (Raspberry Pi) in the suspected area and monitor the network to find issues. This method works well in this situation because:
- The software will start automatically in between boots.
- Meaning that after setup, you can just unplug the device, move it to where you want to monitor and let it run without having to even log in to the device
- Isolation from existing infrastructure - emulation of the endpoint device.
- Having a machine physically move around dedicated to monitoring and having it connected by WiFi means that it's in a position to detect issues similarly to how a user would. It can experience bandwidth limiting from an AP, low signal strength and various other radio and AP-related issues, unlike any monitoring software that resides in the server rack or configured with the AP itself.
- However, this does include drawbacks. As it is so isolated, you're unable to see metrics such as number of devices connected, networking rules or anything higher level. If this is an issue for your diagnosis, you could implement both systems in parallel anyway.
- Having a machine physically move around dedicated to monitoring and having it connected by WiFi means that it's in a position to detect issues similarly to how a user would. It can experience bandwidth limiting from an AP, low signal strength and various other radio and AP-related issues, unlike any monitoring software that resides in the server rack or configured with the AP itself.
- Cloud-based reporting
- All metrics are sent to Grafana Cloud. So if a device does go down, or your entire network is faulty. Any other stable internet connection can load into the Grafana dashboard and see what's happened
- Control variables
- If you have a device running this software in a spot that is known to be good, you can compare it with other devices in more troublesome areas on the Grafana dashboard.
In essence, if you're looking to monitor endpoint network information in areas of a building with complete isolation from existing infrastructure, this may be helpful.
Note
Note that by using this software you agree to Ookla's TOS:
==============================================================================
You may only use this Speedtest software and information generated
from it for personal, non-commercial use, through a command line
interface on a personal computer. Your use of this software is subject
to the End User License Agreement, Terms of Use and Privacy Policy at
these URLs:
https://www.speedtest.net/about/eula
https://www.speedtest.net/about/terms
https://www.speedtest.net/about/privacy
==============================================================================
OS | Compatability |
---|---|
Linux Mint 21 | 🟩 |
MacOS | 🟩 |
64 bit raspiOS lite | 🟩 |
Anything that runs Docker | Probably |
Developed and tested for Raspberry Pi 4b. May work on other Linux devices, not sure. I developed this on an arm-based Mac and it also works fine. If you can run Docker you're probably fine.
- Software
- Install
docker / docker compose
if you haven't got it already. If using raspberry pi follow these official instructions and skip the Docker Desktop instruction just below - If you are using Docker Desktop, make sure
Docker-Desktop
is set to automatically open as a startup app if you plan to have this run automatically. I mean the entire app as well, not just the engine and container services.
- Install
Tip
An optimal setup for Raspberry Pi would be to set up an automatic WiFi connection which can be done with $ sudo nmtui
over CLI. This way you can plug and play. Alternatively, you can technically run this in wired mode if you don't care about monitoring APs.
- Create a GrafanaCloud dashboard
- Connect with Prometheus. Pick options so that you use Grafana Cloud, send data from your own device and avoid alloy (Leave other options as default)
- Take note of Prometheus URL, Username and Token
- Import dashboard
Dashboard/Speedtest-Exporter.json
Check you have the device prerequisites listed above.
- Clone the repository
git clone https://github.com/Sandwich1699975/NetCheck.git
cd NetCheck
git submodule init
git submodule update
- Run the one-time setup script to generate the
.env
file
bash setup.sh
- Add your details into
.env
nano .env
- Then run Docker compose from the project root in detached mode.
sudo docker compose up -d --build
Note
Every time you add or turn on/off a device, you need to refresh the Grafana page for it to appear in the variable field at the top
As Ookla is a free service, it rate limits it's users who abuse the service. Official reccomendations are to only allow one speedtest per IP per hour.
To abide by this limitation, NetCheck automatically schedules speedtests by using the Grafana Prometheus endpoint as a dedicated distributor by allowing each client to observe how many clients are currently online by just quering the speedtest_up
history (the actual metric is mostly irrelevant).
Note
This design relies on the fact that Ookla allows tests 'on demand' rather than following an exact and strict regiment of 'one per hour per IP'. It also assumes the user is setting up NetCheck on the one LAN with a single WAN IP shared between all clients. Otherwise each client would be limited more than they need to be.
Where to access local servers when setup (can differ)
- Make sure you have a seperate dashboard per WAN address.
- To avoid Ookla rate limiting of one test per IP per hour
env file /home/username/NetCheck/.env not found: stat /home/username/NetCheck/.env: no such file or directory
Run bash setup.sh
to generate .env
file.
[+] Building 0.3s (4/4) FINISHED docker:default
=> [netcheck-exporter internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 2B 0.0s
=> [prometheus internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 780B 0.0s
=> CANCELED [prometheus internal] load metadata for docker.io/prom/prometheus:latest 0.1s
=> CANCELED [prometheus internal] load metadata for docker.io/library/python:3.9-slim 0.1s
failed to solve: failed to read dockerfile: open Dockerfile: no such file or directory
You probably forgot to initialise the git submodules. Run:
git submodule init
git submodule update