Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLT/JBD & JKBMS BLE problems #86

Open
DenkBrettl opened this issue Oct 8, 2024 · 18 comments
Open

LLT/JBD & JKBMS BLE problems #86

DenkBrettl opened this issue Oct 8, 2024 · 18 comments
Labels
bug Something isn't working help wanted Extra attention is needed wontfix This will not be worked on

Comments

@DenkBrettl
Copy link

Describe the bug

When running v1.4.20240928, I regularly get segfaults in the python binary using a JKBMS via BLE.

I've downgraded again to v1.3.20240705 and this one has run fine since a day now. I know this is hard to debug, but I'm happy to help. If you know how to enable core dumps on this, it would be helpful, otherwise I can research it.

How to reproduce

Install latest release, wait for a certain time (few hours), segfaults regularly occur.

Expected behavior

No segfaults

Driver version of the currently installed driver

v1.4.20240928

Driver version of the last known working driver

v1.3.20240705

Venus OS device type

Raspberry Pi 4

Venus OS version

v3.42

BMS type

JKBMS (Heltec BMS)

Cell count

16

Battery count

2

Connection type

Bluetooth

Config file

[DEFAULT]

; If you want to add custom values/settings, then check the values/settings you want to change in "config.default.ini"
; and insert them below to persist future driver updates.

; Example (remove the semicolon ";" to uncomment and activate the value/setting):

LOGGING = INFO

; May need to run bluetoothctl; then in the cli "scan on" and "trust MAC" for new devices
; C8:47:8C:E2:D1:A9 MoarBatt01
; C8:47:8C:E2:DB:1F MoarBatt02
BLUETOOTH_BMS = Jkbms_Ble C8:47:8C:E2:D1:A9, Jkbms_Ble C8:47:8C:E2:DB:1F

MAX_BATTERY_CHARGE_CURRENT = 280.0
MAX_BATTERY_DISCHARGE_CURRENT = 280.0


; --------- Cell Voltages ---------
; Description:
;     Cell min/max voltages which are used for:
;         - Calculating the min/max battery voltage
;         - Trigger the SoC reset when SoC calculation is enabled
; Example:
;     16 cells * 3.45 V/cell = 55.2 V max charge voltage.
;     16 cells * 2.90 V/cell = 46.4 V min discharge voltage
MIN_CELL_VOLTAGE   = 2.850
; Max voltage (can seen as absorption voltage)
MAX_CELL_VOLTAGE   = 3.450
; Float voltage (can be seen as resting voltage)
FLOAT_CELL_VOLTAGE = 3.375


; --------- SOC reset voltage (needs to match BMS settings) ---------
; +++ This has nothing to do with "SOC calculation" in a section below +++
; This is one of the possibilities to reset the SoC to 100%, because of SoC drift.
; Description:
;     May be needed to reset the SoC to 100% once in a while for some BMS, because of SoC drift.
;     Some BMS may needed to reset the SoC to 100% once in a while, because of SoC drift. Some
;     devices, like JKBMS, will reset their internal SOC value, if they reach the upper voltage level.
;     Using this method, the charging voltage can be raised regularly to achieve that.
;     (other BMS like Daly need an active overwriting of the SOC parameter. This happens each time when
;     the charging mode changes from Bulk/Absorption to float (and the cells are equalised). They do
;     not need this feature here.)
;     Specify the cell voltage where the SoC should be reset to 100% by the BMS.
;       - JKBMS: SoC is reset to 100% if one cell reaches OVP (over voltage protection) voltage
;     As you have to adopt this value to your system, I recommend to start with
;     OVP voltage - 0.030 (see Example).
;       - Try to increase (add) by 0.005 in steps, if the system does not switch to float mode, even if
;         the target voltage SOC_RESET_VOLTAGE * CELL_COUNT is reached.
;       - Try to decrease (lower) by 0.005 in steps, if the system hits the OVP too fast, before all
;         cells could be balanced and the system goes into protection mode multiple times.
; Example:
;     If OVP is 3.650, then start with 3.620 and increase/decrease by 0.005
; Note:
;     The value has to be higher as the MAX_CELL_VOLTAGE
;     You also have to set CELL_VOLTAGES_WHILE_CHARGING accordingly, if you set CCCM_CV_ENABLE to true
;         else the charging current will be reduced to 0 before the target voltage is reached and the
;         battery will never switch to float
SOC_RESET_VOLTAGE = 3.570
; Specify after how many days the SOC reset voltage should be reached again
; The timer is reset when the SOC reset voltage is reached
; Leave empty if you don't want to use the SOC reset feature
; Example:
;     Value is set to 15
;     day 1: SOC reset reached once
;     day 16: SOC reset reached twice
;     day 31: SOC reset not reached since it's very cloudy
;     day 34: SOC reset reached since the sun came out
;     day 49: SOC reset reached again, since last time it took 3 days to reach SOC reset voltage
SOC_RESET_AFTER_DAYS = 2

; --------- Cell Voltage Current limitation (affecting CCL/DCL) ---------
; Description: Maximal charge / discharge current will be in-/decreased depending on min and max cell voltages
; Example:
;     18 cells * 3.55 V/cell = 63.9 V max charge voltage 
;     18 cells * 2.70 V/cell = 48.6 V min discharge voltage
;     But in reality not all cells reach the same voltage at the same time. The (dis)charge current
;     will be (in-/)decreased, if even ONE SINGLE BATTERY CELL reaches the limits
                          
; Charge current control management referring to cell-voltage enable (True/False).
CCCM_CV_ENABLE = True     
; Discharge current control management referring to cell-voltage enable (True/False).
DCCM_CV_ENABLE = True
                                                                     
; Set steps to reduce battery current                                     
; The current will be changed linear between those steps if LINEAR_LIMITATION_ENABLE is set to True
CELL_VOLTAGES_WHILE_CHARGING      = 3.60, 3.50, 3.45, 3.30
MAX_CHARGE_CURRENT_CV_FRACTION    =    0, 0.05,  0.5,    1                                      
                                                                                              
CELL_VOLTAGES_WHILE_DISCHARGING   = 2.70, 2.80, 2.90, 3.10                                            
MAX_DISCHARGE_CURRENT_CV_FRACTION =    0,  0.1,  0.5,    1

Relevant log output

'current' log of one of the batteries:

2024-10-07 18:48:55.139862500 
2024-10-07 18:48:57.045012500 INFO:SerialBattery:
2024-10-07 18:48:57.045307500 INFO:SerialBattery:Starting dbus-serialbattery
2024-10-07 18:48:57.046355500 INFO:SerialBattery:Venus OS v3.42
2024-10-07 18:48:57.046633500 INFO:SerialBattery:dbus-serialbattery v1.4.20240928
2024-10-07 18:49:13.314624500 INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 18:49:13.314917500 INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 18:49:17.324509500 INFO:SerialBattery:BAT: JKBMS 10.XG 16S (20231003)
2024-10-07 18:49:17.325302500 INFO:SerialBattery:Connection established to Jkbms_Ble
2024-10-07 18:49:17.812952500 INFO:SerialBattery:Found existing battery with DeviceInstance = 1
2024-10-07 18:49:17.955688500 INFO:SerialBattery:DeviceInstance = 1
2024-10-07 18:49:17.956666500 INFO:SerialBattery:PID file created successfully: /var/tmp/dbus-serialbattery_1.pid
2024-10-07 18:49:17.956972500 INFO:SerialBattery:Used DeviceInstances = ['1', '3']
2024-10-07 18:49:17.957280500 INFO:SerialBattery:com.victronenergy.battery.ble_c8478ce2db1f__0
2024-10-07 18:49:17.980172500 INFO:SerialBattery:publish config values = False
2024-10-07 18:49:17.992789500 INFO:SerialBattery:Battery JKBMS BLE connected to dbus from ble_c8478ce2db1f
2024-10-07 18:49:17.992794500 INFO:SerialBattery:========== Settings ==========
2024-10-07 18:49:17.992797500 INFO:SerialBattery:> Connection voltage: 53.0 V | Current: -13.1 A | SoC: 91%
2024-10-07 18:49:17.992801500 INFO:SerialBattery:> Cell count: 16 | Cells populated: 16
2024-10-07 18:49:17.992804500 INFO:SerialBattery:> LINEAR LIMITATION ENABLE: True
2024-10-07 18:49:17.992807500 INFO:SerialBattery:> MIN CELL VOLTAGE: 2.850 V | MAX CELL VOLTAGE: 3.450 V| FLOAT CELL VOLTAGE: 3.375 V
2024-10-07 18:49:17.992878500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 280.0 A | MAX BATTERY DISCHARGE CURRENT: 280.0 A
2024-10-07 18:49:17.992882500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 150.0 A | MAX BATTERY DISCHARGE CURRENT: 150.0 A (read from BMS)
2024-10-07 18:49:17.992886500 INFO:SerialBattery:> CVCM:     True
2024-10-07 18:49:17.992889500 INFO:SerialBattery:> CCCM CV:  True  | DCCM CV:  True
2024-10-07 18:49:17.992918500 INFO:SerialBattery:> CCCM T:   True  | DCCM T:   True
2024-10-07 18:49:17.992921500 INFO:SerialBattery:> CCCM SOC: False | DCCM SOC: False
2024-10-07 18:49:17.992924500 INFO:SerialBattery:> CHARGE FET: True | DISCHARGE FET: True | BALANCE FET: True
2024-10-07 18:49:17.992928500 INFO:SerialBattery:Serial Number/Unique Identifier: 2032809157
2024-10-07 18:55:40.719322500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -81
2024-10-07 18:55:40.719328500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -84
2024-10-07 18:59:50.874364500 ./run: line 12:  4199 Segmentation fault      python /opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py Jkbms_Ble C8:47:8C:E2:DB:1F
2024-10-07 18:59:51.030523500 
2024-10-07 18:59:51.030625500 
2024-10-07 18:59:51.030791500 INFO:Preparing Bluetooth for connection to BMS
2024-10-07 18:59:51.030860500 INFO:Bluetooth details
2024-10-07 18:59:58.995073500 Attempting to disconnect from C8:47:8C:E2:DB:1F
2024-10-07 18:59:58.995077500 [CHG] Device C8:47:8C:E2:D1:A9 RSSI is nil
2024-10-07 18:59:58.995081500 [CHG] Controller E4:5F:01:06:4B:BD Discovering: no
2024-10-07 18:59:58.995084500 [CHG] Device C8:47:8C:E2:DB:1F ServicesResolved: no
2024-10-07 18:59:58.995088500 Successful disconnected
2024-10-07 19:00:04.130834500 Device C8:47:8C:E2:DB:1F (public)
2024-10-07 19:00:04.130839500 	Alias: MoarBatt02
2024-10-07 19:00:04.130842500 	Paired: no
2024-10-07 19:00:04.130844500 	Trusted: yes
2024-10-07 19:00:04.130846500 	Blocked: no
2024-10-07 19:00:04.130848500 	Connected: no
2024-10-07 19:00:04.130851500 	LegacyPairing: no
2024-10-07 19:00:04.130853500 	UUID: Device Information        (0000180a-0000-1000-8000-00805f9b34fb)
2024-10-07 19:00:04.130856500 	RSSI: -80
2024-10-07 19:00:04.133142500 
2024-10-07 19:00:06.012735500 INFO:SerialBattery:
2024-10-07 19:00:06.013011500 INFO:SerialBattery:Starting dbus-serialbattery
2024-10-07 19:00:06.014625500 INFO:SerialBattery:Venus OS v3.42
2024-10-07 19:00:06.014894500 INFO:SerialBattery:dbus-serialbattery v1.4.20240928
2024-10-07 19:00:22.260999500 INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 19:00:22.261284500 INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 19:00:26.268169500 INFO:SerialBattery:BAT: JKBMS 10.XG 16S (20231003)
2024-10-07 19:00:26.268858500 INFO:SerialBattery:Connection established to Jkbms_Ble
2024-10-07 19:00:26.723916500 INFO:SerialBattery:Found existing battery with DeviceInstance = 1
2024-10-07 19:00:26.853614500 INFO:SerialBattery:DeviceInstance = 1
2024-10-07 19:00:26.854515500 INFO:SerialBattery:PID file created successfully: /var/tmp/dbus-serialbattery_1.pid
2024-10-07 19:00:26.854800500 INFO:SerialBattery:Used DeviceInstances = ['1', '3']
2024-10-07 19:00:26.855096500 INFO:SerialBattery:com.victronenergy.battery.ble_c8478ce2db1f__0
2024-10-07 19:00:26.874699500 INFO:SerialBattery:publish config values = False
2024-10-07 19:00:26.887890500 INFO:SerialBattery:Battery JKBMS BLE connected to dbus from ble_c8478ce2db1f
2024-10-07 19:00:26.887896500 INFO:SerialBattery:========== Settings ==========
2024-10-07 19:00:26.887899500 INFO:SerialBattery:> Connection voltage: 52.83 V | Current: -30.1 A | SoC: 90%
2024-10-07 19:00:26.887903500 INFO:SerialBattery:> Cell count: 16 | Cells populated: 16
2024-10-07 19:00:26.887906500 INFO:SerialBattery:> LINEAR LIMITATION ENABLE: True
2024-10-07 19:00:26.887909500 INFO:SerialBattery:> MIN CELL VOLTAGE: 2.850 V | MAX CELL VOLTAGE: 3.450 V| FLOAT CELL VOLTAGE: 3.375 V
2024-10-07 19:00:26.887982500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 280.0 A | MAX BATTERY DISCHARGE CURRENT: 280.0 A
2024-10-07 19:00:26.887987500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 150.0 A | MAX BATTERY DISCHARGE CURRENT: 150.0 A (read from BMS)
2024-10-07 19:00:26.887991500 INFO:SerialBattery:> CVCM:     True
2024-10-07 19:00:26.887994500 INFO:SerialBattery:> CCCM CV:  True  | DCCM CV:  True
2024-10-07 19:00:26.888023500 INFO:SerialBattery:> CCCM T:   True  | DCCM T:   True
2024-10-07 19:00:26.888026500 INFO:SerialBattery:> CCCM SOC: False | DCCM SOC: False
2024-10-07 19:00:26.888029500 INFO:SerialBattery:> CHARGE FET: True | DISCHARGE FET: True | BALANCE FET: True
2024-10-07 19:00:26.888189500 INFO:SerialBattery:Serial Number/Unique Identifier: 2032809157
2024-10-07 19:02:51.979184500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -80
2024-10-07 19:02:51.979190500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -75


dmesg -T capture of one occurrence:
[Mon Oct  7 18:59:49 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:59:49 2024] CPU: 1 PID: 4199 Comm: python Tainted: G         C        5.10.110-rpi-venus-4 #1
[Mon Oct  7 18:59:49 2024] Hardware name: BCM2711
[Mon Oct  7 18:59:49 2024] PC is at 0xb6afd898
[Mon Oct  7 18:59:49 2024] LR is at 0xb6ae5698
[Mon Oct  7 18:59:49 2024] pc : [<b6afd898>]    lr : [<b6ae5698>]    psr: 60000010
[Mon Oct  7 18:59:49 2024] sp : beea33b0  ip : b6b1e2ac  fp : 00000000
[Mon Oct  7 18:59:49 2024] r10: 00000000  r9 : b69d8a0c  r8 : beea3474
[Mon Oct  7 18:59:49 2024] r7 : 01945848  r6 : 0192b3e0  r5 : b4102dd0  r4 : b412c248
[Mon Oct  7 18:59:49 2024] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000000
[Mon Oct  7 18:59:49 2024] Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
[Mon Oct  7 18:59:49 2024] Control: 30c5383d  Table: 08c88f00  DAC: 55555555
[Mon Oct  7 18:59:49 2024] CPU: 1 PID: 4199 Comm: python Tainted: G         C        5.10.110-rpi-venus-4 #1
[Mon Oct  7 18:59:49 2024] Hardware name: BCM2711
[Mon Oct  7 18:59:49 2024] Backtrace: 
[Mon Oct  7 18:59:49 2024] [<c0c181c0>] (dump_backtrace) from [<c0c18534>] (show_stack+0x20/0x24)
[Mon Oct  7 18:59:49 2024]  r7:ffffffff r6:00000000 r5:60000113 r4:c14e9918
[Mon Oct  7 18:59:49 2024] [<c0c18514>] (show_stack) from [<c0c1ca0c>] (dump_stack+0xc8/0xf4)
[Mon Oct  7 18:59:49 2024] [<c0c1c944>] (dump_stack) from [<c0208b84>] (show_regs+0x1c/0x20)
[Mon Oct  7 18:59:49 2024]  r10:fffffe30 r9:c481ff58 r8:00000f00 r7:0000000a r6:c457e3c0 r5:c481e000
[Mon Oct  7 18:59:49 2024]  r4:c481ffb0 r3:10b4f0f3
[Mon Oct  7 18:59:49 2024] [<c0208b68>] (show_regs) from [<c023bd64>] (get_signal+0x9fc/0xa00)
[Mon Oct  7 18:59:49 2024] [<c023b368>] (get_signal) from [<c020b6d4>] (do_work_pending+0x15c/0x568)
[Mon Oct  7 18:59:49 2024]  r10:fffffe30 r9:5ac3c35a r8:c481ffb0 r7:00000001 r6:c481ff44 r5:c481ff28
[Mon Oct  7 18:59:49 2024]  r4:ffffe000
[Mon Oct  7 18:59:49 2024] [<c020b578>] (do_work_pending) from [<c0200074>] (slow_work_pending+0xc/0x20)
[Mon Oct  7 18:59:49 2024] Exception stack(0xc481ffb0 to 0xc481fff8)
[Mon Oct  7 18:59:49 2024] ffa0:                                     00000000 00000000 00000000 00000000
[Mon Oct  7 18:59:49 2024] ffc0: b412c248 b4102dd0 0192b3e0 01945848 beea3474 b69d8a0c 00000000 00000000
[Mon Oct  7 18:59:49 2024] ffe0: b6b1e2ac beea33b0 b6ae5698 b6afd898 60000010 ffffffff
[Mon Oct  7 18:59:49 2024]  r10:00000000 r9:c481e000 r8:00000000 r7:30c5383d r6:ffffffff r5:60000010
[Mon Oct  7 18:59:49 2024]  r4:b6afd898

Amount of times this is happening:
[Mon Oct  7 16:33:52 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 16:44:40 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 17:21:26 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 17:44:54 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:43:24 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:48:46 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:59:49 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 19:03:50 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 19:34:44 2024] potentially unexpected fatal signal 11.

Any other information that may be helpful

No response

@DenkBrettl DenkBrettl added the bug Something isn't working label Oct 8, 2024
@DenkBrettl
Copy link
Author

Ok, while the BLE setup has been running for a while and was mostly stable, it has always had some moments where it would just hang. I've been thinking about a watchdog process that would monitor that there are new values from the BMS within a certain time frame. Let me know if you think this is a good idea and I might find some time in the next weeks to implement that.

Due to the instability, I'll switch back to serial connection.

@Honusnap
Copy link

I had the same thing, gone back to 1.32

@mr-manuel
Copy link
Owner

Unfortunately I troubleshooted this over 100 hours and found no real solution to all this Bluetooth problems. Therefore I decided to not put any other effort into the Bluetooth part. Another reason is, that the users apparently do not appreciate the work I do and they cannot immagine how much time consuming this all is.

For that 1-2 donations, if at all, a month on over 9.000 dbus-serialbattery installations it is not worth it. If everyone would donate 1 €/year then the motivation would be another, but that is not the case.

Feel free to open a PR :-)

@mr-manuel mr-manuel added help wanted Extra attention is needed wontfix This will not be worked on labels Oct 11, 2024
@Honusnap

This comment was marked as off-topic.

@mr-manuel

This comment was marked as off-topic.

@Honusnap

This comment was marked as off-topic.

@DenkBrettl

This comment was marked as off-topic.

@mr-manuel

This comment was marked as off-topic.

@Honusnap
Copy link

Just when i see in log :
" Cell count: 16 | Cells populated: 32"

Then in battery.py
cell_counter = len(self.cells)
.
.
.
logger.info(
f"> Cell count: {self.cell_count} | Cells populated: {cell_counter}"
)
Mmmmm ok ...do not see where this 32 come from, i got 16 cells.
I do find pretty strange that the self.cells array is 32 in size ... but maybe there is some other things in it ...

@mr-manuel
Copy link
Owner

mr-manuel commented Nov 5, 2024

See also Louisvdw/dbus-serialbattery#1092 (comment), just for informations.

@mr-manuel mr-manuel changed the title Segmentation faults in python with v1.4.20240928 on JKBMS BLE LLT/JBD & JKBMS BLE problems Nov 26, 2024
@Felixrising
Copy link

Felixrising commented Dec 30, 2024

I'm also trying v1.3.20240705 and seeing less disconnects (none so far).

As commented elsewhere, the driver has some flakey logic that kills the driver on disconnect in an attempt to recover, but it seems to be making it worse, reverting some of that bandaid stuff seems prudent.

@mr-manuel
Copy link
Owner

Are you all sure, that the data is still refreshing? This version has no check for that and could be one reason, why it does not report any issue.

@Felixrising
Copy link

Felixrising commented Dec 30, 2024

You could be right, but I've been watching the cell voltages pretty consistently and seen constant changes....

BTW I tried the dev branch and also saw exits after the bluetooth daemon was attempted to be reset when run manually (after the driver crashed)...

:~# python /data/apps/dbus-serialbattery/dbus-serialbattery.py Jkbms_Ble C8:47:8C:ED:1B:EE
INFO:SerialBattery:
INFO:SerialBattery:Starting dbus-serialbattery
INFO:SerialBattery:Venus OS v3.52 running on Raspberry Pi 4 Model B Rev 1.1
INFO:SerialBattery:dbus-serialbattery **v2.0.20241223dev**
INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:ED:1B:EE
INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:ED:1B:EE
INFO:SerialBattery:--> asy_connect_and_scrape(): error while connecting to bt: BleakDBusError('org.bluez.Error.Failed', 'Software caused connection abort') of type <class 'bleak.exc.BleakDBusError'> in /data/apps/dbus-serialbattery/bms/jkbms_brn.py line #443
INFO:SerialBattery:--> asy_connect_and_scrape(): Exit
INFO:SerialBattery:Reset of system Bluetooth daemon triggered
Starting bluetooth: bluetoothd.
INFO:SerialBattery:System Bluetooth daemon should have been restarted
INFO:SerialBattery:--> asy_connect_and_scrape(): error while connecting to bt: BleakDBusError('org.bluez.Error.Failed', 'Software caused connection abort') of type <class 'bleak.exc.BleakDBusError'> in /data/apps/dbus-serialbattery/bms/jkbms_brn.py line #443
INFO:SerialBattery:--> asy_connect_and_scrape(): Exit
INFO:SerialBattery:Reset of system Bluetooth daemon triggered
Starting bluetooth: bluetoothd.
INFO:SerialBattery:System Bluetooth daemon should have been restarted
ERROR:SerialBattery:No BMS found at C8:47:8C:ED:1B:EE
ERROR:SerialBattery:ERROR >>> No battery connection at Jkbms_Ble
INFO:SerialBattery:Exit signal received, exiting gracefully...
INFO:SerialBattery:Stopped dbus-serialbattery with exit code 1
^CINFO:SerialBattery:Exit signal received, exiting gracefully...
INFO:SerialBattery:Stopped dbus-serialbattery with exit code 0
Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 1388, in _shutdown
    lock.acquire()
  File "/data/apps/dbus-serialbattery/dbus-serialbattery.py", line 132, in exit_driver
    sys.exit(code)
SystemExit: 0

I have a ch340g ttl coming, and will move to wired... but I think despite buggy bluetooth stack in RPI, it should be capable of continuing to work.

@Felixrising
Copy link

Felixrising commented Dec 31, 2024

I think this is the main difference in logic between the older 1.3 and the 1.5 version. result from a failure is captured in the current cycle in 1.5, but in the older 1.3 the failure is looked at in the subsequent cycle where the

 # get first data
                result = result and self.get_settings()
                result = result and self.refresh_data()

...

return result

whereas the older version allows the above logic to run through (tries every 500ms for 20 cycles)) before flagging result as bad.

                if result:
                    self.get_settings()
                    self.refresh_data()

This is important because result instantly returns a single failure, whereas in 1.3 the failure would be retried a bunch of times first... I'm not a coder and really don't know python, but this seems like an important difference. Happy to be wrong. Bluetooth can be glitchy, so retries is important, I also still think killing the bluetoothd is not the right approach.

I also think the use of bluetootctl needs some reworking. bluetoothctl --timeout S should be used in scripts (bluez/bluez#826 , bluez/bluez#826 (comment) ) to limit the time it runs for, and scan on, etc can take some time to fully complete. So more cooling off period between disconnects and reconnects, with status checks for these completing etc is probably required, so the tasks bluez is handling can properly complete and avoid race conditions or whatever.

@serrzonzub
Copy link

not that it's terribly helpful, but i can add that the 20240705 version (mostly) reliably gets data from the BLE on the varter heated battery i have installed in my application. i don't see any significant issues when running this version, that i can identify with my limited experience with this technology.

@IamSierraCharlie
Copy link

IamSierraCharlie commented Jan 1, 2025

Okay, so I managed to track down more info on this segmentation fault....
I suspected it was coming from dbushelper.py but haven't been able to get any further on it until now.
I installed and enabled faulthandler on jkbms_brn.py
enabled it using:

faulthandler.enable(file=sys.stderr, all_threads=True)

and now have the following data from the failure:

2025-01-01 07:44:15.807704500 Current thread 0xb48ff460 (most recent call first):
2025-01-01 07:44:15.807803500   File "/usr/lib/python3.8/site-packages/dbus_fast/_private/unmarshaller.py", line 332 in _read_sock_without_fds
2025-01-01 07:44:15.808738500   File "/usr/lib/python3.8/site-packages/dbus_fast/_private/unmarshaller.py", line 376 in _read_to_pos
2025-01-01 07:44:15.809688500   File "/usr/lib/python3.8/site-packages/dbus_fast/_private/unmarshaller.py", line 636 in _read_header
2025-01-01 07:44:15.810755500   File "/usr/lib/python3.8/site-packages/dbus_fast/_private/unmarshaller.py", line 775 in _unmarshall
2025-01-01 07:44:15.811758500   File "/usr/lib/python3.8/site-packages/dbus_fast/aio/message_reader.py", line 19 in _message_reader
2025-01-01 07:44:15.812754500   File "/usr/lib/python3.8/asyncio/events.py", line 81 in _run
2025-01-01 07:44:15.813341500   File "/usr/lib/python3.8/asyncio/base_events.py", line 1859 in _run_once
2025-01-01 07:44:15.814031500   File "/usr/lib/python3.8/asyncio/base_events.py", line 570 in run_forever
2025-01-01 07:44:15.814778500   File "/usr/lib/python3.8/asyncio/base_events.py", line 603 in run_until_complete
2025-01-01 07:44:15.815630500   File "/usr/lib/python3.8/asyncio/runners.py", line 44 in run
2025-01-01 07:44:15.815936500   File "/opt/victronenergy/dbus-serialbattery/bms/jkbms_brn.py", line 432 in connect_and_scrape
2025-01-01 07:44:15.816837500   File "/usr/lib/python3.8/threading.py", line 870 in run
2025-01-01 07:44:15.817349500   File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
2025-01-01 07:44:15.817956500   File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap
2025-01-01 07:44:15.818519500 
2025-01-01 07:44:15.818576500 Thread 0xb52df460 (most recent call first):
2025-01-01 07:44:15.818579500   File "/usr/lib/python3.8/threading.py", line 1027 in _wait_for_tstate_lock
2025-01-01 07:44:15.819319500   File "/usr/lib/python3.8/threading.py", line 1011 in join
2025-01-01 07:44:15.819718500   File "/opt/victronenergy/dbus-serialbattery/bms/jkbms_brn.py", line 504 in monitor_scraping
2025-01-01 07:44:15.820685500   File "/usr/lib/python3.8/threading.py", line 870 in run
2025-01-01 07:44:15.822110500   File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
2025-01-01 07:44:15.822770500   File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap
2025-01-01 07:44:15.823283500 
2025-01-01 07:44:15.823334500 Thread 0xb6f99010 (most recent call first):
2025-01-01 07:44:15.823338500   File "/usr/lib/python3.8/site-packages/gi/overrides/GLib.py", line 497 in run
2025-01-01 07:44:15.824033500   File "/opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py", line 394 in main
2025-01-01 07:44:15.830750500   File "/opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py", line 400 in <module>
2025-01-01 07:44:16.047990500 ./run: line 12:  6253 Segmentation fault      python /opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py Jkbms_Ble 20:23:04:33:BE:BB

I though this pointed to the function -> merge_dicts, but have found it varies from failure to failure.....

def merge_dicts(self, dict1, dict2) -> None:
        for key in dict2:
            if key in dict1 and isinstance(dict1[key], dict) and isinstance(dict2[key], dict):
                self.merge_dicts(dict1[key], dict2[key])
            else:
                dict1[key] = dict2[key]

I've added some logging to it to see what else might pop up.

The dictionary info looks like this:

2025-01-01 08:06:50.680081500 INFO:SerialBattery:keys for merge dicts is:
2025-01-01 08:06:50.680084500 (dict1[key]):
2025-01-01 08:06:50.680087500 {'BatteryLife': {'DischargedTime': '0', 'Flags': '0', 'MinimumSocLimit': '10.0', 'Schedule': {'Charge': {'0': {'AllowDischarge': '0', 'Day': '-7', 'Duration': '0', 'Soc': '100', 'Start': '0'}, '1': {'AllowDischarge': '0', 'Day': '-7', 'Duration': '0', 'Soc': '100', 'Start': '0'}, '2': {'AllowDischarge': '0', 'Day': '-7', 'Duration': '0', 'Soc': '100', 'Start': '0'}, '3': {'AllowDischarge': '0', 'Day': '-7', 'Duration': '0', 'Soc': '100', 'Start': '0'}, '4': {'AllowDischarge': '0', 'Day': '-7', 'Duration': '0', 'Soc': '100', 'Start': '0'}}}, 'SocLimit': '10.0'}}
2025-01-01 08:06:50.680147500 (dict2[key])
2025-01-01 08:06:50.680149500 {'BatteryLife': {'State': '1'}}

Dict2 keys vary from call to call.
I thought that there was something that in the second dictionary that the function doesn't like, but now I've seen this fault....

2025-01-01 08:14:50.715618500 Thread 0xb48ff460 (most recent call first):
2025-01-01 08:14:50.715621500   File "/usr/lib/python3.8/os.py", line 675 in __getitem__
2025-01-01 08:14:50.715624500   File "/usr/lib/python3.8/_collections_abc.py", line 668 in __contains__
2025-01-01 08:14:50.715627500   File "/opt/victronenergy/dbus-serialbattery/dbushelper.py", line 32 in get_bus
2025-01-01 08:14:50.715657500   File "/opt/victronenergy/dbus-serialbattery/dbushelper.py", line 1000 in publish_dbus
2025-01-01 08:14:50.715663500   File "/opt/victronenergy/dbus-serialbattery/dbushelper.py", line 803 in publish_battery
2025-01-01 08:14:50.715670500   File "/opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py", line 96 in poll_battery
2025-01-01 08:14:50.715677500   File "/opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py", line 363 in <lambda>
2025-01-01 08:14:50.715712500   File "/opt/victronenergy/dbus-serialbattery/bms/jkbms_brn.py", line 345 in assemble_frame
2025-01-01 08:14:50.715792500   File "/opt/victronenergy/dbus-serialbattery/bms/jkbms_brn.py", line 350 in ncallback
2025-01-01 08:14:50.715796500   File "/usr/lib/python3.8/site-packages/bleak/backends/bluezdbus/client.py", line 182 in on_value_changed
2025-01-01 08:14:50.715802500   File "/usr/lib/python3.8/site-packages/bleak/backends/bluezdbus/manager.py", line 1026 in _parse_msg
2025-01-01 08:14:50.715808500   File "/usr/lib/python3.8/site-packages/dbus_fast/message_bus.py", line 805 in _process_message
2025-01-01 08:14:50.715847500   File "/usr/lib/python3.8/site-packages/dbus_fast/aio/message_reader.py", line 23 in _message_reader
2025-01-01 08:14:50.715852500   File "/usr/lib/python3.8/asyncio/events.py", line 81 in _run
2025-01-01 08:14:50.715856500   File "/usr/lib/python3.8/asyncio/base_events.py", line 1859 in _run_once
2025-01-01 08:14:50.715859500   File "/usr/lib/python3.8/asyncio/base_events.py", line 570 in run_forever
2025-01-01 08:14:50.715888500   File "/usr/lib/python3.8/asyncio/base_events.py", line 603 in run_until_complete
2025-01-01 08:14:50.715892500   File "/usr/lib/python3.8/asyncio/runners.py", line 44 in run
2025-01-01 08:14:50.715895500   File "/opt/victronenergy/dbus-serialbattery/bms/jkbms_brn.py", line 432 in connect_and_scrape
2025-01-01 08:14:50.716481500   File "/usr/lib/python3.8/threading.py", line 870 in run
2025-01-01 08:14:50.717032500   File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
2025-01-01 08:14:50.717784500   File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap
2025-01-01 08:14:50.718265500 
2025-01-01 08:14:50.718323500 Thread 0xb526b460 (most recent call first):
2025-01-01 08:14:50.718326500   File "/usr/lib/python3.8/threading.py", line 1027 in _wait_for_tstate_lock
2025-01-01 08:14:50.719134500   File "/usr/lib/python3.8/threading.py", line 1011 in join
2025-01-01 08:14:50.719600500   File "/opt/victronenergy/dbus-serialbattery/bms/jkbms_brn.py", line 504 in monitor_scraping
2025-01-01 08:14:50.720536500   File "/usr/lib/python3.8/threading.py", line 870 in run
2025-01-01 08:14:50.722613500   File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
2025-01-01 08:14:50.722617500   File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap
2025-01-01 08:14:50.722620500 
2025-01-01 08:14:50.722622500 Current thread 0xb6f25010 (most recent call first):
2025-01-01 08:14:50.722625500   File "/usr/lib/python3.8/site-packages/gi/overrides/GLib.py", line 497 in run
2025-01-01 08:14:50.722629500   File "/opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py", line 394 in main
2025-01-01 08:14:50.722883500   File "/opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py", line 400 in <module>
2025-01-01 08:14:50.936918500 ./run: line 12: 13215 Segmentation fault      python /opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py Jkbms_Ble 20:23:04:33:BE:BB

Still a segmentation fault, but different is some ways...
Im at the end of my ability on this - but I hope its helpful to someone...

@Felixrising
Copy link

Felixrising commented Jan 1, 2025

I may be chasing down a rabbit hole, but have come to the conclusion that use of raspberry pi ble and wifi seems problematic, potentially bluez issue compounding it, but use of external ble adapter is probably good advice [EDIT: confirmed still having frequent dropouts with two different usb adapters]. I'm still against the nuclear option of killing bluetoothd as a regular way of handling connection issues as bad practice, maybe include as a config option for people who are impacted and this is the only way to recover, still a band-aid for underlying bug/s in bluez/bleak.

Handling the connected state and riding out the issues with disconnect and reconnect seems more prudent, with a backoff mechanism, and retries (with logging). bluetooth isn't serial, disconnections/drop outs etc are part of life with BLE. Anecdotally the older logic in v1.3 does seem more forgiving and have had it continuing to work for 24+ hours... (confirmed updates coming into the gui - I didn't get into adding code to capture drop outs and recoveries, but this should ideally be logged).

https://bleak.readthedocs.io/en/latest/troubleshooting.html#occasional-not-connected-errors-on-raspberry-pi
project-chip/connectedhomeip#16178
hbldh/bleak#367

NB: the logic in enable.sh to disable internal BT depends on /u-boot/config.txt having "dtoverlay=miniuart-bt" somewhere, currently that is not the case, at least not on my fairly fresh install of venus os 3.5. Some enhancement to append the disable-bt to the overlay is required instead:

existing:

if [ -f "/u-boot/config.txt" ]; then
    if [[ $bluetooth_use_usb == *"True"* ]]; then
        if grep -q -r "miniuart-bt" /u-boot/config.txt; then
            sed -i 's/miniuart-bt/disable-bt/g' /u-boot/config.txt
            echo "ATTENTION! You have changed the bluetooth mode to USB! THIS NEEDS A MANUAL REBOOT!"
        fi
    elif [[ $bluetooth_use_usb == *"False"* ]]; then
        if grep -q -r "disable-bt" /u-boot/config.txt; then
            sed -i 's/disable-bt/miniuart-bt/g' /u-boot/config.txt
            echo "ATTENTION! You have changed the bluetooth mode to built in module! THIS NEEDS A MANUAL REBOOT!"
        fi
    fi
fi

proposed:

#!/bin/bash

# Check if /u-boot/config.txt exists
if [ -f "/u-boot/config.txt" ]; then
    if [[ $bluetooth_use_usb == *"True"* ]]; then
        # Disable built-in Bluetooth
        if grep -q "^dtoverlay=" "/u-boot/config.txt"; then
            # Check if 'disable-bt' is already in a dtoverlay line
            if grep -q "^dtoverlay=.*disable-bt" "/u-boot/config.txt"; then
                echo "Bluetooth is already disabled."
            else
                # Add 'disable-bt' to the end of an existing dtoverlay line
                sed -i '/^dtoverlay=/s/$/,disable-bt/' /u-boot/config.txt
                echo "ATTENTION! 'disable-bt' has been added to the dtoverlay list. THIS NEEDS A MANUAL REBOOT!"
            fi
        else
            # Add a new dtoverlay line for disable-bt if none exist
            echo "dtoverlay=disable-bt" >> /u-boot/config.txt
            echo "ATTENTION! 'disable-bt' has been added as a new dtoverlay. THIS NEEDS A MANUAL REBOOT!"
        fi
    elif [[ $bluetooth_use_usb == *"False"* ]]; then
        # Enable built-in Bluetooth
        if grep -q "^dtoverlay=.*disable-bt" "/u-boot/config.txt"; then
            # Remove 'disable-bt' from an existing dtoverlay line
            sed -i '/^dtoverlay=/s/disable-bt,//g' /u-boot/config.txt  # Case: disable-bt is not at the end
            sed -i '/^dtoverlay=/s/,disable-bt//g' /u-boot/config.txt  # Case: disable-bt is at the end
            sed -i '/^dtoverlay=/s/disable-bt//g' /u-boot/config.txt   # Case: disable-bt is the only entry
            echo "ATTENTION! 'disable-bt' has been removed from the dtoverlay list. THIS NEEDS A MANUAL REBOOT!"
        else
            echo "Bluetooth is already enabled."
        fi
    fi
else
    echo "Error: /u-boot/config.txt does not exist. This script only works on Raspberry Pi devices."
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

6 participants