Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[neeo] Convert to OH jetty HttpClient #15571

Merged
merged 11 commits into from
Oct 8, 2023
Merged

Conversation

morph166955
Copy link
Contributor

Resolves #15563

Replaces ClientBuilder/HttpClient with OH Jetty HttpClient.

Signed-off-by: Ben Rosenblum <[email protected]>
@morph166955
Copy link
Contributor Author

morph166955 commented Sep 9, 2023

The only issue I have with this implementation is that it's not quite as responsive when there are multiple button clicks in a short period of time (e.g. moving around a menu). Each button click sends a ton of sensor updates to the brain so it looks like it jams up. There is a thread pool which is supposed to be handling this but I'm worried that we end up bottling up at the client side since we only have one client. It may be a better approach to roll a separate client per thread in the pool?

EDIT: A potential fix would be to use

https://github.com/openhab/openhab-core/blob/f6435ec132dd5b4bd6b58d0ad608d336b08e8ede/bundles/org.openhab.core.io.net/src/main/java/org/openhab/core/io/net/http/HttpClientFactory.java#L43

to generate a new HttpClient for each request. That said, the code comments explicitly say to not do this, probably to avoid the issues that this PR fixes. I think there needs to be a separate HttpClient per thread in the pool to avoid congestion, but they need to be in a pool and reused.

EDIT2: And just to keep track, this is where we probably need to maintain the pool as this is what's sending the messages to api.notify()

https://github.com/morph166955/openhab-addons/blob/746e10c121f6f3b2d39b9c76939cd52653b9dcda/bundles/org.openhab.io.neeo/src/main/java/org/openhab/io/neeo/internal/servletservices/NeeoBrainService.java#L551

@morph166955
Copy link
Contributor Author

I found a happy medium. For the load intensive threads (notifications), a Stack of HttpClient's is maintained. As a thread needs one, it can pop one off the top. When it's done it returns it to the stack. Stack should not get larger than the size of the thread pool (in my case that's usually about 5 threads). Remote is much more responsive now and the threads stay in control.

@morph166955 morph166955 requested a review from a team September 10, 2023 00:50
@J-N-K
Copy link
Member

J-N-K commented Sep 10, 2023

I didn't look into detail in every bit of code, but to me it looks like there is a general design issue. As an example:

NeeoApi.executeRecipe makes a synchronous HTTP request but everything the caller (NeeoBrainService.notifyState) does is logging a warning of the call failed (by catching the exception thrown in executeRecipe. It's soft asynchronous because the call to executeRecipe is scheduled, but I believe Jetty can handle that a lot better.

You can call Request.send(Response.CompleteListener listener) and move the response handling to the listener. The executeRecipe method then returns fast and you can omit scheduling the call to the method.

I believe this is also the case for there calls and you can probably fix your performance issues without doing fancy "several HTTP clients" stuff.

@morph166955
Copy link
Contributor Author

I would definitely agree there are some design issues here. The binding and the additional io addon seem over complicated. Unfortunately the neeo remote has been a dead product for years now. Several of us still use them so for the moment it's worth fixing. At some point this is going to end up falling off when the juice isn't worth the squeeze.

The real problem with this particular issue is the amount of data being sent per key press. I didn't count it but somewhere around 40 http post/get sessions happen for every key press on the remote. Most of it is updates to some kind of sensor data which all seem to never seem to change. That huge burst is what's causing the log jam because the actual single http session with the actual key press gets burried in there. The more hits, the more delays. I watched last night as I rapidly flew through a menu and the sensor hits went for almost 15 seconds after I stopped because of how deep the queue was.

In my case also, I wasn't running any recipes at all. Just pressing a button. I have no doubt that what you're saying is spot on, there's probably a dozen of those splashed around the code. I'm unfortunately out of time to keep jamming at fixing this, yesterday was my only break for the next week or so and this won because my OH was just unstable. IMHO, the real fix for this is finding out why we're flooding the brain with sensor updates that are likely unnecessary. I appreciate you giving it a look for those of us who still use this. My understanding of how this binding works is high level at best.

@morph166955
Copy link
Contributor Author

morph166955 commented Sep 11, 2023

So I was REALLY wrong with my count. I just did a few tests. On average, EVERY BUTTON CLICK creates 244 sensor updates to the brain. This is causing a massive log jam. One thing to note of value, it looks like the actual doGet below isn't using the sensor values httpClient, so in theory we could just create one client for the sensors and let it take the bulk of the load and then let everything else use the built in jetty client. It would eliminate the need for the stack, just slow down the sensor updates (which I have no idea what that is, I've never used it).

Also interesting to note, it looks like it runs through both of the Neeo remotes I have every time (122 per). You can see the neeo brain ID change between the top and bottom.

2023-09-11 17:08:53.360 [DEBUG] [hab.io.neeo.internal.AbstractServlet] - doGet: http://10.255.0.5:8080/neeo/NEEO-05dbea22/device/apt-neeo_io:virtual:nVYazPtU/NEEO_Command/none/16/button/LivingRoomTV_CursorUp/de
fault
2023-09-11 17:08:53.360 [DEBUG] [nal.servletservices.NeeoBrainService] - handleSetValue PathInfo [thingUid=neeo_io:virtual:nVYazPtU, itemName=NEEO_Command, subType=none, channelNbr=16, componentType=button, com
ponentSubType=actor, actionValue=LivingRoomTV_CursorUp]
2023-09-11 17:08:53.361 [DEBUG] [nal.servletservices.NeeoBrainService] - Posting item event: Item 'NEEO_Command' received command LivingRoomTV_CursorUp
2023-09-11 17:08:53.361 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-713af1e8 depth 5
2023-09-11 17:08:53.361 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-713af1e8): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"7096150530719744000:NEEO_Command-
none_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:53.361 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-713af1e8 depth 4
2023-09-11 17:08:53.362 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-713af1e8): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"7096150530719744000:NEEO_Command-
none-2_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:53.362 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-713af1e8 depth 3
2023-09-11 17:08:53.362 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-713af1e8): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"7096150530719744000:NEEO_Command-
none-3_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:53.362 [DEBUG] [al.handler.NeeoForwardActionsServlet] - handleForwardActions {"action":"CURSOR UP","device":"Livingroom TV","room":"Living Room"}
2023-09-11 17:08:53.362 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-713af1e8 depth 2
2023-09-11 17:08:53.363 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-713af1e8): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"7096150530719744000:NEEO_Command-
none-4_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:53.363 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-713af1e8 depth 1
2023-09-11 17:08:53.363 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-713af1e8): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"7096150530719744000:NEEO_Command-
none-5_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:53.378 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-713af1e8): 200 - {"success":true}
2023-09-11 17:08:53.378 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-713af1e8 depth 0
2023-09-11 17:08:53.378 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-713af1e8 depth 1
2023-09-11 17:08:53.378 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-713af1e8): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"7096150530719744000:NEEO_Command-
none-6_SENSOR","sensorValue":"-"}}
...About 200 more lines like this...
2023-09-11 17:08:55.337 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 0
2023-09-11 17:08:55.338 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-05dbea22 depth 1
2023-09-11 17:08:55.338 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-05dbea22): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"6469589197639385088:NEEO_Command-none-28_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:55.354 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-05dbea22): 200 - {"success":true}
2023-09-11 17:08:55.354 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 0
2023-09-11 17:08:55.355 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient popped a client from the stack for brain NEEO-05dbea22 depth 1
2023-09-11 17:08:55.355 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Sending Notification to brain (NEEO-05dbea22): {"type":"DEVICE_SENSOR_UPDATE","data":{"sensorEventKey":"6469589197639385088:NEEO_Command-none-29_SENSOR","sensorValue":"-"}}
2023-09-11 17:08:55.362 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-05dbea22): 200 - {"success":true}
2023-09-11 17:08:55.362 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 0
2023-09-11 17:08:55.369 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-05dbea22): 200 - {"success":true}
2023-09-11 17:08:55.369 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 1
2023-09-11 17:08:55.377 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-05dbea22): 200 - {"success":true}
2023-09-11 17:08:55.377 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 2
2023-09-11 17:08:55.384 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-05dbea22): 200 - {"success":true}
2023-09-11 17:08:55.384 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 3
2023-09-11 17:08:55.391 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - Response from brain (NEEO-05dbea22): 200 - {"success":true}
2023-09-11 17:08:55.391 [DEBUG] [org.openhab.io.neeo.internal.NeeoApi] - getHttpClient returned a client for brain NEEO-05dbea22 depth 4

@morph166955
Copy link
Contributor Author

morph166955 commented Sep 11, 2023

EDIT: This is a bug but not caused by this PR. If DEBUG logging is enabled this exception happens when updating anything on the UI. Disabling debug for io.neeo resolves this.

This may have broke something else. I can't make changes on the UI now...

2023-09-11 17:47:50.573 [WARN ] [org.eclipse.jetty.server.HttpChannel] - /neeo/neeostatus/updatedevice
java.lang.IllegalStateException: READER
        at org.eclipse.jetty.server.Request.getInputStream(Request.java:922) ~[?:?]
        at javax.servlet.ServletRequestWrapper.getInputStream(ServletRequestWrapper.java:142) ~[bundleFile:4.0.4]
        at org.openhab.io.neeo.internal.AbstractServlet.doPost(AbstractServlet.java:140) ~[?:?]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:517) ~[bundleFile:4.0.4]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:584) ~[bundleFile:4.0.4]
        at org.ops4j.pax.web.service.spi.servlet.OsgiInitializedServlet.service(OsgiInitializedServlet.java:102) ~[?:?]
        at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1450) ~[?:?]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[?:?]
        at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) ~[?:?]
        at org.ops4j.pax.web.service.spi.servlet.OsgiFilterChain.doFilter(OsgiFilterChain.java:100) ~[?:?]
        at org.ops4j.pax.web.service.jetty.internal.PaxWebServletHandler.doHandle(PaxWebServletHandler.java:310) ~[?:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[?:?]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) ~[?:?]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[?:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[?:?]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[?:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[?:?]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[?:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[?:?]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[?:?]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[?:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[?:?]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[?:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[?:?]
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[?:?]
        at org.ops4j.pax.web.service.jetty.internal.PrioritizedHandlerCollection.handle(PrioritizedHandlerCollection.java:96) ~[?:?]
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:722) ~[?:?]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[?:?]
        at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[?:?]
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[?:?]
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[?:?]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[?:?]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[?:?]
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[?:?]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[?:?]
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[?:?]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) ~[bundleFile:9.4.50.v20221201]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) ~[bundleFile:9.4.50.v20221201]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) ~[bundleFile:9.4.50.v20221201]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) ~[bundleFile:9.4.50.v20221201]
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) ~[bundleFile:9.4.50.v20221201]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[bundleFile:9.4.50.v20221201]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[bundleFile:9.4.50.v20221201]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]

@mherwege
Copy link
Contributor

I am running into the same issue with the OH4.1 milestone. I never had the issue before. I had to disable the Neeo IO service to make my system stable again. I didn't have time to do further analysis on my own, but would appreciate this fix gets included. I don't think it can make things worse, and the whole product is in maintenance mode anyway.

@mherwege
Copy link
Contributor

Even after removing the Neeo transport I still have a growth in httpclient threads. So there still is something else going on.

@mherwege
Copy link
Contributor

mherwege commented Sep 14, 2023

I think I found the reason I still have the problem. I am also using the Neeo binding (not transport). It also has a similar servlet, probably with the same problem: https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.neeo/src/main/java/org/openhab/binding/neeo/internal/handler/NeeoForwardActionsServlet.java

@morph166955
Copy link
Contributor Author

Try this...

Uninstall both transport and binding. Wait for OH to dispose of everything. Shutdown OH.

Install https://github.com/morph166955/openhab-addons/releases/download/neeo/org.openhab.io.neeo-4.1.0-SNAPSHOT.jar into your addons folder.

Start OH. Only install the binding now.

My system stabilized with the updated io module.

@mherwege
Copy link
Contributor

I will try that.
Do you use the forwardAction functionality in the binding? I believe that is where the binding has an issue. If you don't use it, I doubt you would be affected in the binding.

@morph166955
Copy link
Contributor Author

Not sure. My usage is limited to creating a few virtual devices and pushing them down to the brain. I let my OH rules handle the rest. The remote is nothing more than a nice interface for me.

@mherwege
Copy link
Contributor

Did as you requested, installed your version of the transport, all OK.
shell:threads | grep -i httpclient | grep -i runnable | grep -i selector | wc -l in the console gives a result of 0.
Immediately after installing the binding again, I get 13. Every 2 minutes, 2 are added to this count, so increasing rapidely again.

@morph166955
Copy link
Contributor Author

Mine leveled off after a while. Always stayed under 100 so it never destabilized the system. I have been having problems with one of the brains. I think the instability caused something to go crazy. My remote won't connect anymore, says the data from the brain is bad somehow. I have noticed that every time I power cycled the brains the threads went into a funky state and didn't release. I'll probably have to restart OH to clear those for now.

Signed-off-by: Mark Herwege <[email protected]>
@mherwege
Copy link
Contributor

I just created a PR against your branch, applying the same changes you did for the transport to the binding. So far, that stabilizes my system.

@morph166955
Copy link
Contributor Author

Awesome. It's very likely you don't need the whole stack part. You can probably just take the parts to convert to the built in httpClient from jetty. I didn't see anywhere on the binding side where it did the massive sensor bursts that the io did.

@mherwege
Copy link
Contributor

It's very likely you don't need the whole stack part.

That's what I thought, so I didn't include that in the binding. So far it works for me. If you can test as well, the combination potentially stabilizes our Neeos for a while.
I still don't understand what changed from the OH release to the OH milestone, but something broke the way it worked before.

@morph166955
Copy link
Contributor Author

I'm on my phone at the moment. Do you have a jar somewhere I can pull? Do you want to submit a PR to my branch and we can just do one PR here?

@mherwege
Copy link
Contributor

mherwege commented Sep 14, 2023

I'm on my phone at the moment. Do you have a jar somewhere I can pull? Do you want to submit a PR to my branch and we can just do one PR here?

I already did a PR to your branch.

Here is a jar: https://www.dropbox.com/scl/fi/llh10txvry7ba6zknci9w/org.openhab.binding.neeo-4.1.0-SNAPSHOT.jar?rlkey=b5pr0qx6yqhkey482qsygjjle&dl=0

Applied transport fix to binding
@morph166955
Copy link
Contributor Author

Merged your changes into this PR! Thank you! I also just finally fixed the brain that went sideways and it looks like things are working smoothly here.

@morph166955
Copy link
Contributor Author

I restarted OH and this is now what I'm at:

"OH-httpClient-neeo-NEEO-713af1e8-1-2181" Id=2181 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-1-2203" Id=2203 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-2-2216" Id=2216 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-2-2233" Id=2233 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-3-2241" Id=2241 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-3-2257" Id=2257 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-4-2300" Id=2300 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-4-2308" Id=2308 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-1-2581" Id=2581 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-1-2583" Id=2583 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-2-2586" Id=2586 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-2-2590" Id=2590 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-3-2592" Id=2592 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-3-2596" Id=2596 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-4-2599" Id=2599 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-4-2603" Id=2603 in RUNNABLE (running in native)

This makes sense, I have 2 brains and each is building a thread pool that is 4 deep (I've seen 5 happen before but no larger). I'm "OK" with this because it's not really putting a huge strain on the system AND it's stable. While I do agree with @J-N-K that it's probably not the best fix, I think it's a good middle of the road that mixes stability with speed and most importantly gets those on snapshot (and milestone) back to a stable state.

@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/neeo-remote-binding-transport/33408/427

@morph166955
Copy link
Contributor Author

morph166955 commented Sep 15, 2023

The adjustments to the binding seem problematic @mherwege

My remote stopped sending commands to the channels after a few hours. I can't find an error, the logs rolled on me at some point. I had to restart OH to get it to come back. Not sure why it crashed after a few hours. I did notice that it started to throw messages of:

2023-09-14 21:03:35.096 [DEBUG] [al.handler.NeeoForwardActionsServlet] - handleForwardActions {"action":"CURSOR DOWN","device":"Livingroom TV","room":"Living Room"}

But then it doesn't send to the channel. When I restarted OH, I didn't get the handleForwardActions messages, but it did start to update the channel again.

@lolodomo lolodomo added the bug An unexpected problem or unintended behavior of an add-on label Sep 23, 2023
@morph166955
Copy link
Contributor Author

I've got a 1 week uptime on my OH right now and it looks like the slow creep has stopped with the last merge. It creates a max of 5 on the stack and then disposes of them after that.

openhab> shell:threads | grep -i neeo
"OH-httpClient-neeo-NEEO-05dbea22-1-1588" Id=1588 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2b6c8ce1
"OH-httpClient-neeo-NEEO-05dbea22-1-1589" Id=1589 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-1-1590" Id=1590 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2b6c8ce1
"OH-httpClient-neeo-NEEO-05dbea22-1-1591" Id=1591 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-1-1592" Id=1592 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2b6c8ce1
"OH-httpClient-neeo-NEEO-05dbea22-2-1594" Id=1594 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@66dbcb2c
"OH-httpClient-neeo-NEEO-05dbea22-2-1595" Id=1595 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@66dbcb2c
"OH-httpClient-neeo-NEEO-05dbea22-2-1596" Id=1596 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-2-1597" Id=1597 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@66dbcb2c
"OH-httpClient-neeo-NEEO-05dbea22-3-1601" Id=1601 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-3-1602" Id=1602 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-3-1603" Id=1603 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1f7c92a
"OH-httpClient-neeo-NEEO-05dbea22-3-1604" Id=1604 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1f7c92a
"OH-httpClient-neeo-NEEO-05dbea22-3-1605" Id=1605 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1f7c92a
"OH-httpClient-neeo-NEEO-05dbea22-4-1610" Id=1610 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@9f4dfb1
"OH-httpClient-neeo-NEEO-05dbea22-4-1611" Id=1611 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@9f4dfb1
"OH-httpClient-neeo-NEEO-05dbea22-4-1612" Id=1612 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@9f4dfb1
"OH-httpClient-neeo-NEEO-05dbea22-4-1613" Id=1613 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-05dbea22-4-1614" Id=1614 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-1-2045" Id=2045 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-1-2046" Id=2046 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@229e71cd
"OH-httpClient-neeo-NEEO-713af1e8-1-2047" Id=2047 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@229e71cd
"OH-httpClient-neeo-NEEO-713af1e8-1-2048" Id=2048 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-1-2049" Id=2049 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@229e71cd
"OH-httpClient-neeo-NEEO-713af1e8-2-2052" Id=2052 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@30820e20
"OH-httpClient-neeo-NEEO-713af1e8-2-2054" Id=2054 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@30820e20
"OH-httpClient-neeo-NEEO-713af1e8-2-2055" Id=2055 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-2-2056" Id=2056 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@30820e20
"OH-httpClient-neeo-NEEO-713af1e8-2-2057" Id=2057 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-3-2059" Id=2059 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-3-2060" Id=2060 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-3-2061" Id=2061 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@406ff7e3
"OH-httpClient-neeo-NEEO-713af1e8-3-2062" Id=2062 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@406ff7e3
"OH-httpClient-neeo-NEEO-713af1e8-3-2063" Id=2063 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@406ff7e3
"OH-httpClient-neeo-NEEO-713af1e8-4-2065" Id=2065 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-4-2068" Id=2068 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@279a280a
"OH-httpClient-neeo-NEEO-713af1e8-4-2070" Id=2070 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-4-2071" Id=2071 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@279a280a
"OH-httpClient-neeo-NEEO-05dbea22-5-161390" Id=161390 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4c7bbda6
"OH-httpClient-neeo-NEEO-05dbea22-5-161391" Id=161391 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4c7bbda6
"OH-httpClient-neeo-NEEO-05dbea22-5-161392" Id=161392 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4c7bbda6
"OH-httpClient-neeo-NEEO-05dbea22-5-161393" Id=161393 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-5-161400" Id=161400 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@192de59a
"OH-httpClient-neeo-NEEO-713af1e8-5-161401" Id=161401 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-5-161402" Id=161402 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@192de59a
"OH-httpClient-neeo-NEEO-713af1e8-5-161403" Id=161403 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@192de59a
"OH-httpClient-neeo-NEEO-713af1e8-5-161404" Id=161404 in RUNNABLE (running in native)
"OH-neeoio-9" Id=1021904 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6063ffb1
"OH-httpClient-neeo-NEEO-05dbea22-5-1772017" Id=1772017 in RUNNABLE (running in native)
"OH-httpClient-neeo-NEEO-713af1e8-4-2110533" Id=2110533 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@279a280a
"OH-httpClient-neeo-NEEO-05dbea22-2-2133851" Id=2133851 in RUNNABLE (running in native)
"OH-neeoio-19" Id=3765853 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6063ffb1
"OH-neeoio-24" Id=4758445 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6063ffb1
"OH-neeoio-25" Id=4855842 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6063ffb1
"OH-neeoio-26" Id=4856538 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6063ffb1

Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my unique comment, I would like that you confirm that the HTTP client configuration is not updated when you are now using the common HTTP client from the core.

@wborn wborn mentioned this pull request Oct 2, 2023
@morph166955
Copy link
Contributor Author

@openhab/add-ons-maintainers - With CODEOWNERS now updated, can we resolve the pending review request for tmrobert8 and merge this in (or provide additional review as necessary so we can close this). 4.1 is currently broken for neeo users and it's crashing systems.

@morph166955
Copy link
Contributor Author

In addition to my unique comment, I would like that you confirm that the HTTP client configuration is not updated when you are now using the common HTTP client from the core.

I can't find anything in the code that implies to me that we are making any changes to the common client. @mherwege can you confirm?

@mherwege
Copy link
Contributor

mherwege commented Oct 3, 2023

I can't find anything in the code that implies to me that we are making any changes to the common client. @mherwege can you confirm?

I also don’t see anything that changes the common client.

@morph166955
Copy link
Contributor Author

openhab/openhab-core#3826 may have resolved the underlying root cause.

That said, would it still be a good idea to commit this and migrate to the jetty client or do we leave it alone and close this?

@morph166955
Copy link
Contributor Author

No dice, the system stayed stable but the moment I went to use the remote it went to dead again.

@lolodomo
Copy link
Contributor

lolodomo commented Oct 7, 2023

So this PR remains useful because the change in core did not help? If you confirm, I will finish the review and merge.

@morph166955
Copy link
Contributor Author

Correct. The core change maintains stability only when the remote is unused. The moment buttons are pushed it all crashes. This is absolutely still necessary.

Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@lolodomo lolodomo merged commit 58d2083 into openhab:main Oct 8, 2023
2 checks passed
@lolodomo lolodomo added this to the 4.1 milestone Oct 8, 2023
@morph166955
Copy link
Contributor Author

Thank you!

@morph166955 morph166955 deleted the neeo-thread branch October 8, 2023 14:02
pat-git023 pushed a commit to pat-git023/openhab-addons that referenced this pull request Oct 13, 2023
* Convert to OH jetty HttpClient
* Adds a Stack for HttpClient storage
* Add synchronized to prevent exception
* fix binding
* Stop stack on close
* Resolves exception on registring forward actions
* Reduces client count to 5 to align to expected thread pool limit of 5

---------

Signed-off-by: Ben Rosenblum <[email protected]>
Co-authored-by: Mark Herwege <[email protected]>
@lolodomo
Copy link
Contributor

@jlaur : as this PR is fixing a thread leak leading to a quick out of memory (crash), it is with also #15779 a candidate for cherry-pick in 4.0 branch.

@morph166955
Copy link
Contributor Author

Oddly enough 4.0 didn't leak. We still never found the root cause of what dependency got upgraded which cause this.

@jlaur
Copy link
Contributor

jlaur commented Oct 20, 2023

So it sounds like there is no reason to risk including it in 4.0.x?

@lolodomo
Copy link
Contributor

Yes, sorry, I thought users were first complaining with 4.0. Apparently I was wrong.

@morph166955
Copy link
Contributor Author

I don't have any reports from 4.0 that I can find and I don't remember having the issue until 4.1.

querdenker2k pushed a commit to querdenker2k/openhab-addons that referenced this pull request Oct 21, 2023
* Convert to OH jetty HttpClient
* Adds a Stack for HttpClient storage
* Add synchronized to prevent exception
* fix binding
* Stop stack on close
* Resolves exception on registring forward actions
* Reduces client count to 5 to align to expected thread pool limit of 5

---------

Signed-off-by: Ben Rosenblum <[email protected]>
Co-authored-by: Mark Herwege <[email protected]>
querdenker2k pushed a commit to querdenker2k/openhab-addons that referenced this pull request Oct 29, 2023
* Convert to OH jetty HttpClient
* Adds a Stack for HttpClient storage
* Add synchronized to prevent exception
* fix binding
* Stop stack on close
* Resolves exception on registring forward actions
* Reduces client count to 5 to align to expected thread pool limit of 5

---------

Signed-off-by: Ben Rosenblum <[email protected]>
Co-authored-by: Mark Herwege <[email protected]>
Signed-off-by: querdenker2k <[email protected]>
austvik pushed a commit to austvik/openhab-addons that referenced this pull request Mar 27, 2024
* Convert to OH jetty HttpClient
* Adds a Stack for HttpClient storage
* Add synchronized to prevent exception
* fix binding
* Stop stack on close
* Resolves exception on registring forward actions
* Reduces client count to 5 to align to expected thread pool limit of 5

---------

Signed-off-by: Ben Rosenblum <[email protected]>
Co-authored-by: Mark Herwege <[email protected]>
Signed-off-by: Jørgen Austvik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[neeo] Excessive Thread Leak - HttpClient-*-SelectorManager
6 participants