Skip to content

Commit

Permalink
Merge pull request #136 from geeksville/reliable
Browse files Browse the repository at this point in the history
WIP for reliable unicast and BLE software update
  • Loading branch information
geeksville authored May 19, 2020
2 parents 3089de7 + 71041e8 commit e05e324
Show file tree
Hide file tree
Showing 50 changed files with 1,168 additions and 533 deletions.
3 changes: 3 additions & 0 deletions bin/build-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ function do_build {
cp $SRCELF $OUTDIR/elfs/firmware-$ENV_NAME-$COUNTRY-$VERSION.elf
}

# Make sure our submodules are current
git submodule update

# Important to pull latest version of libs into all device flavors, otherwise some devices might be stale
platformio lib update

Expand Down
46 changes: 46 additions & 0 deletions boards/nrf52840_dk_modified.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"build": {
"arduino": {
"ldscript": "nrf52840_s140_v6.ld"
},
"core": "nRF5",
"cpu": "cortex-m4",
"extra_flags": "-DARDUINO_NRF52840_PCA10056 -DNRF52840_XXAA",
"f_cpu": "64000000L",
"hwids": [["0x239A", "0x4404"]],
"usb_product": "SimPPR",
"mcu": "nrf52840",
"variant": "pca10056-rc-clock",
"variants_dir": "variants",
"bsp": {
"name": "adafruit"
},
"softdevice": {
"sd_flags": "-DS140",
"sd_name": "s140",
"sd_version": "6.1.1",
"sd_fwid": "0x00B6"
},
"bootloader": {
"settings_addr": "0xFF000"
}
},
"connectivity": ["bluetooth"],
"debug": {
"jlink_device": "nRF52840_xxAA",
"onboard_tools": ["jlink"],
"svd_path": "nrf52840.svd"
},
"frameworks": ["arduino"],
"name": "A modified NRF52840-DK devboard (Adafruit BSP)",
"upload": {
"maximum_ram_size": 248832,
"maximum_size": 815104,
"require_upload_port": true,
"speed": 115200,
"protocol": "jlink",
"protocols": ["jlink", "nrfjprog", "stlink"]
},
"url": "https://meshtastic.org/",
"vendor": "Nordic Semi"
}
2 changes: 1 addition & 1 deletion boards/ppr.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"hwids": [["0x239A", "0x4403"]],
"usb_product": "PPR",
"mcu": "nrf52840",
"variant": "pca10056-rc-clock",
"variant": "ppr",
"variants_dir": "variants",
"bsp": {
"name": "adafruit"
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# What is Meshtastic?

Meshtastic is a project that lets you use
inexpensive (\$30 ish) GPS radios as an extensible, super long battery life mesh GPS communicator. These radios are great for hiking, skiing, paragliding - essentially any hobby where you don't have reliable internet access. Each member of your private mesh can always see the location and distance of all other members and any text messages sent to your group chat.
inexpensive (\$30 ish) GPS radios as an extensible, long battery life, secure, mesh GPS communicator. These radios are great for hiking, skiing, paragliding - essentially any hobby where you don't have reliable internet access. Each member of your private mesh can always see the location and distance of all other members and any text messages sent to your group chat.

The radios automatically create a mesh to forward packets as needed, so everyone in the group can receive messages from even the furthest member. The radios will optionally work with your phone, but no phone is required.

Expand Down
21 changes: 10 additions & 11 deletions docs/software/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,16 @@ Items to complete soon (next couple of alpha releases).
- lower wait_bluetooth_secs to 30 seconds once we have the GPS power on (but GPS in sleep mode) across light sleep. For the time
being I have it set at 2 minutes to ensure enough time for a GPS lock from scratch.

- remeasure wake time power draws now that we run CPU down at 80MHz

# AXP192 tasks

- figure out why this fixme is needed: "FIXME, disable wake due to PMU because it seems to fire all the time?"
- "AXP192 interrupt is not firing, remove this temporary polling of battery state"
- make debug info screen show real data (including battery level & charging) - close corresponding github issue

# Medium priority

Items to complete before the first beta release.

- Don't store position packets in the to phone fifo if we are disconnected. The phone will get that info for 'free' when it
fetches the fresh nodedb.
- Use the RFM95 sequencer to stay in idle mode most of the time, then automatically go to receive mode and automatically go from transmit to receive mode. See 4.2.8.2 of manual.
- Use 32 bits for message IDs
- Use fixed32 for node IDs
- Remove the "want node" node number arbitration process
- Don't store position packets in the to phone fifo if we are disconnected. The phone will get that info for 'free' when it
fetches the fresh nodedb.
- Use the RFM95 sequencer to stay in idle mode most of the time, then automatically go to receive mode and automatically go from transmit to receive mode. See 4.2.8.2 of manual.
- possibly switch to https://github.com/SlashDevin/NeoGPS for gps comms
- good source of battery/signal/gps icons https://materialdesignicons.com/
- research and implement better mesh algorithm - investigate changing routing to https://github.com/sudomesh/LoRaLayer2 ?
Expand Down Expand Up @@ -204,3 +199,7 @@ Items after the first final candidate release.
- enable fast lock and low power inside the gps chip
- Make a FAQ
- add a SF12 transmit option for _super_ long range
- figure out why this fixme is needed: "FIXME, disable wake due to PMU because it seems to fire all the time?"
- "AXP192 interrupt is not firing, remove this temporary polling of battery state"
- make debug info screen show real data (including battery level & charging) - close corresponding github issue
- remeasure wake time power draws now that we run CPU down at 80MHz
8 changes: 5 additions & 3 deletions docs/software/cypto.md → docs/software/crypto.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Cryptography is tricky, so we've tried to 'simply' apply standard crypto solutio
the project developers are not cryptography experts. Therefore we ask two things:

- If you are a cryptography expert, please review these notes and our questions below. Can you help us by reviewing our
notes below and offering advice? We will happily give as much or as little credit as you wish as our thanks ;-).
- Consider our existing solution 'alpha' and probably fairly secure against an not very aggressive adversary. But until
notes below and offering advice? We will happily give as much or as little credit as you wish ;-).
- Consider our existing solution 'alpha' and probably fairly secure against a not particularly aggressive adversary. But until
it is reviewed by someone smarter than us, assume it might have flaws.

## Notes on implementation
Expand All @@ -16,7 +16,7 @@ the project developers are not cryptography experts. Therefore we ask two things

Parameters for our CTR implementation:

- Our AES key is 256 bits, shared as part of the 'Channel' specification.
- Our AES key is 128 or 256 bits, shared as part of the 'Channel' specification.
- Each SubPacket will be sent as a series of 16 byte BLOCKS.
- The node number concatenated with the packet number is used as the NONCE. This counter will be stored in flash in the device and should essentially never repeat. If the user makes a new 'Channel' (i.e. picking a new random 256 bit key), the packet number will start at zero. The packet number is sent
in cleartext with each packet. The node number can be derived from the "from" field of each packet.
Expand All @@ -35,4 +35,6 @@ Note that for both stategies, sizes are measured in blocks and that an AES block
## Remaining todo

- Make the packet numbers 32 bit
- Confirm the packet #s are stored in flash across deep sleep (and otherwise in in RAM)
- Have the app change the crypto key when the user generates a new channel
- Implement for NRF52 [NRF52](https://infocenter.nordicsemi.com/topic/com.nordic.infocenter.sdk5.v15.0.0/lib_crypto_aes.html#sub_aes_ctr)
74 changes: 67 additions & 7 deletions docs/software/mesh-alg.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,80 @@
# Mesh broadcast algorithm

FIXME - instead look for standard solutions. this approach seems really suboptimal, because too many nodes will try to rebroast. If
all else fails could always use the stock Radiohead solution - though super inefficient.

great source of papers and class notes: http://www.cs.jhu.edu/~cs647/

reliable messaging tasks (stage one for DSR):

- DONE generalize naive flooding
- DONE add a max hops parameter, use it for broadcast as well (0 means adjacent only, 1 is one forward etc...). Store as three bits in the header.
- DONE add a 'snoopReceived' hook for all messages that pass through our node.
- DONE use the same 'recentmessages' array used for broadcast msgs to detect duplicate retransmitted messages.
- DONE in the router receive path?, send an ack packet if want_ack was set and we are the final destination. FIXME, for now don't handle multihop or merging of data replies with these acks.
- DONE keep a list of packets waiting for acks
- DONE for each message keep a count of # retries (max of three). Local to the node, only for the most immediate hop, ignorant of multihop routing.
- DONE delay some random time for each retry (large enough to allow for acks to come in)
- DONE once an ack comes in, remove the packet from the retry list and deliver the ack to the original sender
- DONE after three retries, deliver a no-ack packet to the original sender (i.e. the phone app or mesh router service)
- DONE test one hop ack/nak with the python framework
- Do stress test with acks

dsr tasks

- do "hop by hop" routing
- when sending, if destnodeinfo.next_hop is zero (and no message is already waiting for an arp for that node), startRouteDiscovery() for that node. Queue the message in the 'waiting for arp queue' so we can send it later when then the arp completes.
- otherwise, use next_hop and start sending a message (with ack request) towards that node.
- Don't use broadcasts for the network pings (close open github issue)
- add ignoreSenders to radioconfig to allow testing different mesh topologies by refusing to see certain senders
- test multihop delivery with the python framework

optimizations / low priority:

- low priority: think more careful about reliable retransmit intervals
- make ReliableRouter.pending threadsafe
- bump up PacketPool size for all the new ack/nak/routing packets
- handle 51 day rollover in doRetransmissions
- use a priority queue for the messages waiting to send. Send acks first, then routing messages, then data messages, then broadcasts?

when we receive any packet

- sniff and update tables (especially useful to find adjacent nodes). Update user, network and position info.
- if we need to route() that packet, resend it to the next_hop based on our nodedb.
- if it is broadcast or destined for our node, deliver locally
- handle routereply/routeerror/routediscovery messages as described below
- then free it

routeDiscovery

- if we've already passed through us (or is from us), then it ignore it
- use the nodes already mentioned in the request to update our routing table
- if they were looking for us, send back a routereply
- if max_hops is zero and they weren't looking for us, drop (FIXME, send back error - I think not though?)
- if we receive a discovery packet, we use it to populate next_hop (if needed) towards the requester (after decrementing max_hops)
- if we receive a discovery packet, and we have a next_hop in our nodedb for that destination we send a (reliable) we send a route reply towards the requester

when sending any reliable packet

- if we get back a nak, send a routeError message back towards the original requester. all nodes eavesdrop on that packet and update their route caches

when we receive a routereply packet

- update next_hop on the node, if the new reply needs fewer hops than the existing one (we prefer shorter paths). fixme, someday use a better heuristic

when we receive a routeError packet

- delete the route for that failed recipient, restartRouteDiscovery()
- if we receive routeerror in response to a discovery,
- fixme, eventually keep caches of possible other routes.

TODO:

- DONE reread the radiohead mesh implementation - hop to hop acknoledgement seems VERY expensive but otherwise it seems like DSR
- optimize our generalized flooding with heuristics, possibly have particular nodes self mark as 'router' nodes.

- DONE reread the radiohead mesh implementation - hop to hop acknowledgement seems VERY expensive but otherwise it seems like DSR
- DONE read about mesh routing solutions (DSR and AODV)
- DONE read about general mesh flooding solutions (naive, MPR, geo assisted)
- DONE reread the disaster radio protocol docs - seems based on Babel (which is AODVish)
- possibly dash7? https://www.slideshare.net/MaartenWeyn1/dash7-alliance-protocol-technical-presentation https://github.com/MOSAIC-LoPoW/dash7-ap-open-source-stack - does the opensource stack implement multihop routing? flooding? their discussion mailing list looks dead-dead
- REJECTED - seems dying - possibly dash7? https://www.slideshare.net/MaartenWeyn1/dash7-alliance-protocol-technical-presentation https://github.com/MOSAIC-LoPoW/dash7-ap-open-source-stack - does the opensource stack implement multihop routing? flooding? their discussion mailing list looks dead-dead
- update duty cycle spreadsheet for our typical usecase
- generalize naive flooding on top of radiohead or disaster.radio? (and fix radiohead to use my new driver)

a description of DSR: https://tools.ietf.org/html/rfc4728 good slides here: https://www.slideshare.net/ashrafmath/dynamic-source-routing
good description of batman protocol: https://www.open-mesh.org/projects/open-mesh/wiki/BATMANConcept
Expand Down Expand Up @@ -77,7 +138,6 @@ look into the literature for this idea specifically.

FIXME, merge into the above:


good description of batman protocol: https://www.open-mesh.org/projects/open-mesh/wiki/BATMANConcept

interesting paper on lora mesh: https://portal.research.lu.se/portal/files/45735775/paper.pdf
Expand Down
33 changes: 23 additions & 10 deletions docs/software/nrf52-TODO.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,18 @@
# NRF52 TODO

## Misc work items

## Initial work items

Minimum items needed to make sure hardware is good.

- test my hackedup bootloader on the real hardware
- add a hard fault handler
- use "variants" to get all gpio bindings
- plug in correct variants for the real board
- Use the PMU driver on real hardware
- Use new radio driver on real hardware
- Use UC1701 LCD driver on real hardware. Still need to create at startup and probe on SPI
- test the LEDs
- test the buttons
- make a new boarddef with a variant.h file. Fix pins in that file. In particular (at least):
#define PIN_SPI_MISO (46)
#define PIN_SPI_MOSI (45)
#define PIN_SPI_SCK (47)
#define PIN_WIRE_SDA (26)
#define PIN_WIRE_SCL (27)

## Secondary work items

Expand Down Expand Up @@ -45,7 +40,6 @@ Needed to be fully functional at least at the same level of the ESP32 boards. At

- use SX126x::startReceiveDutyCycleAuto to save power by sleeping and briefly waking to check for preamble bits. Change xmit rules to have more preamble bits.
- turn back on in-radio destaddr checking for RF95
- remove the MeshRadio wrapper - we don't need it anymore, just do everythin in RadioInterface subclasses.
- figure out what the correct current limit should be for the sx1262, currently we just use the default 100
- put sx1262 in sleepmode when processor gets shutdown (or rebooted), ideally even for critical faults (to keep power draw low). repurpose deepsleep state for this.
- good power management tips: https://devzone.nordicsemi.com/nordic/nordic-blog/b/blog/posts/optimizing-power-on-nrf52-designs
Expand All @@ -62,6 +56,11 @@ Needed to be fully functional at least at the same level of the ESP32 boards. At

Nice ideas worth considering someday...

- Use flego to me an iOS/linux app? https://felgo.com/doc/qt/qtbluetooth-index/ or
- Use flutter to make an iOS/linux app? https://github.com/Polidea/FlutterBleLib
- make a Mfg Controller and device under test classes as examples of custom app code for third party devs. Make a post about this. Use a custom payload type code. Have device under test send a broadcast with max hopcount of 0 for the 'mfgcontroller' payload type. mfg controller will read SNR and reply. DOT will declare failure/success and switch to the regular app screen.
- Hook Segger RTT to the nordic logging framework. https://devzone.nordicsemi.com/nordic/nordic-blog/b/blog/posts/debugging-with-real-time-terminal
- Use nordic logging for DEBUG_MSG
- use the Jumper simulator to run meshes of simulated hardware: https://docs.jumper.io/docs/install.html
- make/find a multithread safe debug logging class (include remote logging and timestamps and levels). make each log event atomic.
- turn on freertos stack size checking
Expand All @@ -72,11 +71,14 @@ Nice ideas worth considering someday...
- in addition to the main CPU watchdog, use the PMU watchdog as a really big emergency hammer
- turn on 'shipping mode' in the PMU when device is 'off' - to cut battery draw to essentially zero
- make Lorro_BQ25703A read/write operations atomic, current version could let other threads sneak in (once we start using threads)
- turn on DFU assistance in the appload using the nordic DFU helper lib call
- make the segger logbuffer larger, move it to RAM that is preserved across reboots and support reading it out at runtime (to allow full log messages to be included in crash reports). Share this code with ESP32 (use gcc noinit attribute)
- convert hardfaults/panics/asserts/wd exceptions into fault codes sent to phone
- stop enumerating all i2c devices at boot, it wastes power & time
- consider using "SYSTEMOFF" deep sleep mode, without RAM retension. Only useful for 'truly off - wake only by button press' only saves 1.5uA vs SYSTEMON. (SYSTEMON only costs 1.5uA). Possibly put PMU into shipping mode?
- change the BLE protocol to be more symmetric. Have the phone _also_ host a GATT service which receives writes to
'fromradio'. This would allow removing the 'fromnum' mailbox/notify scheme of the current approach and decrease the number of packet handoffs when a packet is received.
- Using the preceeding, make a generalized 'nrf52/esp32 ble to internet' bridge service. To let nrf52 apps do MQTT/UDP/HTTP POST/HTTP GET operations to web services.
- lower advertise interval to save power, lower ble transmit power to save power

## Old unorganized notes

Expand All @@ -102,6 +104,17 @@ Nice ideas worth considering someday...
- DONE remove unused sx1262 lib from github
- at boot we are starting our message IDs at 1, rather we should start them at a random number. also, seed random based on timer. this could be the cause of our first message not seen bug.
- add a NEMA based GPS driver to test GPS
- DONE use "variants" to get all gpio bindings
- DONE plug in correct variants for the real board
- turn on DFU assistance in the appload using the nordic DFU helper lib call
- make a new boarddef with a variant.h file. Fix pins in that file. In particular (at least):
#define PIN_SPI_MISO (46)
#define PIN_SPI_MOSI (45)
#define PIN_SPI_SCK (47)
#define PIN_WIRE_SDA (26)
#define PIN_WIRE_SCL (27)
- customize the bootloader to use proper button bindings
- remove the MeshRadio wrapper - we don't need it anymore, just do everything in RadioInterface subclasses.

```
Expand Down
31 changes: 21 additions & 10 deletions platformio.ini
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ build_flags = -Wno-missing-field-initializers -Isrc -Isrc/mesh -Isrc/gps -Ilib/n
; the default is esptool
; upload_protocol = esp-prog

; monitor_speed = 115200
monitor_speed = 921600

# debug_tool = esp-prog
Expand Down Expand Up @@ -83,7 +84,7 @@ src_filter =
upload_speed = 921600
debug_init_break = tbreak setup
build_flags =
${env.build_flags} -Wall -Wextra
${env.build_flags} -Wall -Wextra -Isrc/esp32
lib_ignore = segger_rtt

; The 1.0 release of the TBEAM board
Expand All @@ -92,7 +93,7 @@ extends = esp32_base
board = ttgo-t-beam
lib_deps =
${env.lib_deps}
AXP202X_Library
https://github.com/meshtastic/AXP202X_Library.git
build_flags =
${esp32_base.build_flags} -D TBEAM_V10

Expand Down Expand Up @@ -122,11 +123,9 @@ board = ttgo-lora32-v1
build_flags =
${esp32_base.build_flags} -D TTGO_LORA_V2


; The NRF52840-dk development board
[env:nrf52dk]
; Common settings for NRF52 based targets
[nrf52_base]
platform = nordicnrf52
board = ppr
framework = arduino
debug_tool = jlink
build_type = debug ; I'm debugging with ICE a lot now
Expand All @@ -136,10 +135,6 @@ src_filter =
${env.src_filter} -<esp32/>
lib_ignore =
BluetoothOTA
lib_deps =
${env.lib_deps}
UC1701
https://github.com/meshtastic/BQ25703A.git
monitor_port = /dev/ttyACM1

debug_extra_cmds =
Expand All @@ -150,3 +145,19 @@ debug_init_break =
;debug_init_break = tbreak loop
;debug_init_break = tbreak Reset_Handler

; The NRF52840-dk development board
[env:nrf52dk]
extends = nrf52_base
board = nrf52840_dk_modified

; The PPR board
[env:ppr]
extends = nrf52_base
board = ppr
lib_deps =
${env.lib_deps}
UC1701
https://github.com/meshtastic/BQ25703A.git



Loading

0 comments on commit e05e324

Please sign in to comment.