Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore on Startup not working for RRD4J with OH4.1M4 #16057

Closed
apella12 opened this issue Dec 11, 2023 · 15 comments · Fixed by #16059
Closed

Restore on Startup not working for RRD4J with OH4.1M4 #16057

apella12 opened this issue Dec 11, 2023 · 15 comments · Fixed by #16059
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@apella12
Copy link
Contributor

apella12 commented Dec 11, 2023

Expected Behavior

States restored

Current Behavior

This is mentioned in the community under 4.1 Milestones. Appears to be about three confirmations. Unrestored items are NULL and (for me) about a half a dozen rules bomb.
Also as noted in the forum, the items that are NULL are items created to count reports and lights (not connected to any binding). All my MQTT items were "restored", but I think that just happens when Mqtt is connected.

Possible Solution

Wild guess is that some checking protocol was introduced between M3 and M4 and because not all items (Dates & Images) are managed and the new check fails and the whole restore on startup process stops?.

Steps to Reproduce (for Bugs)

  1. First encounter with upgrade from M3 to M4
  2. Did setup Mapdb to restore on S/U
  3. Accidently did restart before all Mapdb states were persisted
  4. Several more items were not restored- So I think a simple restart on M4 should trigger it.

Context

Rules bomb on startup

Your Environment

Release = Raspbian GNU/Linux 11 (bullseye)
Kernel = Linux 6.1.21-v8+
Platform = Raspberry Pi 4 Model B Rev 1.1
Uptime = 61 day(s). 3:4:28
CPU Usage = 0.75% avg over 4 cpu(s) (4 core(s) x 1 socket(s))
CPU Load = 1m: 0.01, 5m: 0.03, 15m: 0.05
Memory = Free: 0.30GB (17%), Used: 1.54GB (83%), Total: 1.85GB
Swap = Free: 1.92GB (96%), Used: 0.07GB (4%), Total: 1.99GB
Root = Free: 41.24GB (74%), Used: 14.14GB (26%), Total: 58.38GB
Updates = 0 apt updates available.
Processes = 154 running processes of 32768 maximum processes

@apella12 apella12 added the bug An unexpected problem or unintended behavior of an add-on label Dec 11, 2023
@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-4-1-milestone-discussion/149502/135

@lolodomo
Copy link
Contributor

  • First encounter with upgrade from M3 to M4
  • Did setup Mapdb to restore on S/U
  • Accidently did restart before all Mapdb states were persisted
  • Several more items were not restored- So I think a simple restart on M4 should trigger it.

This is finally not clear. Is the problem with rrd4j or MapDB ?

@lolodomo
Copy link
Contributor

@openhab/add-ons-maintainers : looks like a critical bug for OH 4.1. Way to reproduce it is unfortunately not clear at all IMHO.

What change between M3 and M4 could have broken restore on startup feature ?

@J-N-K
Copy link
Member

J-N-K commented Dec 12, 2023

This could also be a core issue. But I need an easy way to reproduce it.

@apella12
Copy link
Contributor Author

apella12 commented Dec 12, 2023

Sorry for the confusion. Problem is with rrd4j. Mapdb just hadn't time to collect a value on those items, so still had NULL values because rrd4j wasn't restoring.

Anyway this should work to recreate.
0) Be on OH4.1M4

  1. have no other restore on startup persistence except rrd4j
  2. Create an item
  3. Initialize the item with value 1
  4. Restart OH
  5. Item value should be NULL after restart (or its just me- and a few others)

@michaeljoos72
Copy link

Hi all

I can confirm it is rrd4j. I did see this when upgrading from M3 to M4. Half of my 600 items did not restore on startup and rules did not work anymore (most probably because value was NULL). Unfortunately I don't have more details or any Logs because it was my productive system. Did immediately go back to stable 4.04 after that.

M3 was stable over weeks for me.

@lolodomo
Copy link
Contributor

I will try with my usual test environment this evening: last snapshot installed from scratch on Windows, just Basic UI and rrd4j as add-ons, few items from file and a DSL rule updating item states.
This will be very fast to see if it work or not.

@kaikreuzer
Copy link
Member

I can actually reproduce it on my production system. I'll try to have a look at it tonight or tomorrow and see if I can help.

@J-N-K
Copy link
Member

J-N-K commented Dec 13, 2023

As I already said: it could also be a core problem, related to the refactoring in openhab/openhab-core#3889. There has always been an issue with the loading of configuration files, because there was a short time after the start of a persistence service with the default configuration applied. It could be that the PR (or some other refactoring) changed the timing and therefore the restoreOnStartup policy gets applied earlier or later than before.

IMO the "default configuration" should be removed completely. In fact, since all persistence services are also OSGi components, the configuration should probably be set before the service is activated and then the persistence manager should request the configuration from the service. But that's not super-easy and should definitely happen after the release.

@J-N-K J-N-K transferred this issue from openhab/openhab-core Dec 13, 2023
@J-N-K
Copy link
Member

J-N-K commented Dec 13, 2023

I can explain the behavior, but I don't have a solution ATM. The case is indeed openhab/openhab-core#3889, but it's not a bug there.

Before that PR no end time was set in the filter criteria when querying the most recent state. This changed because we need to get the first state before now, not the state with the biggest timestamp. There is also no begin timestamp set, and the page size is 1.

Unfortunately "begin=null" and "end=null" and "pagesize=1" has a special handling in rrd4j:

            if (filterBeginDate == null) {
                // as rrd goes back for years and gets more and more
                // inaccurate, we only support descending order
                // and a single return value
                // if there is no begin date is given - this case is
                // required specifically for the historicState()
                // query, which we want to support
                if (filter.getOrdering() == Ordering.DESCENDING && filter.getPageSize() == 1
                        && filter.getPageNumber() == 0) {
                    if (filterEndDate == null) {
                        // we are asked only for the most recent value!
                        double lastValue = db.getLastDatasourceValue(DATASOURCE_STATE);
                        if (!Double.isNaN(lastValue)) {
                            HistoricItem rrd4jItem = new RRD4jItem(itemName, mapToState(lastValue, item, unit),
                                    ZonedDateTime.ofInstant(Instant.ofEpochMilli(db.getLastArchiveUpdateTime() * 1000),
                                            ZoneId.systemDefault()));
                            return List.of(rrd4jItem);
                        } else {
                            return List.of();
                        }
                    } else {
                        start = end;
                    }
                } else {
                    throw new UnsupportedOperationException(
                            "rrd4j does not allow querys without a begin date, unless order is descending and a single value is requested");
                }

because we now have an end-timestamp, the "get last value" part is no longer executed and we don't get a valid value because start=end and the last bin could be empty.

@J-N-K
Copy link
Member

J-N-K commented Dec 13, 2023

Maybe the filterEndDate == null check should be changed to filterEndDate == null || Duration.between(filterEndDate, ZonedDateTime.now()).getSeconds() < 10 or something similar. 10 could be replaced by some dynamic value, like db.getRrdDef().getStep() or half of that.

@kaikreuzer
Copy link
Member

Thanks for the analysis @J-N-K, I am now also back at my keyboard.
So if I get it right, the troublesome line is start = end, which we are now hitting - and this will simply have rrd4j return nothing. Do you actually see any situations where this setting makes any sense? If not, we might want to change this part of the code to somehow retrieve the best value at the moment of the given end time.

@J-N-K
Copy link
Member

J-N-K commented Dec 13, 2023

start == end is a valid selection, it selects exactly one bin, so as long as we know exactly what (=which point in time) we want (historicState) it's correct.

The issue only occurs if we don't know what the last valid bin is (restoreOnStartup requires "last valid value", not "value from last bin"). This special case was handled by start=null,end=null,pageSize=1. That's why I suggested to handle start=null,end=now,pageSize=1 the same way.

@kaikreuzer
Copy link
Member

Ok, got it. So we have to identify if end actually means now and should therefore check for a small duration as per your suggestion above.

@kaikreuzer
Copy link
Member

#16059 seems to nicely address the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
6 participants