Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iCalendar] Fix httpclient getting stuck with broken requests #11866

Merged
merged 2 commits into from
Dec 31, 2021

Conversation

t2000
Copy link
Contributor

@t2000 t2000 commented Dec 27, 2021

Fixes #9827

Signed-off-by: Stefan Triller [email protected]

@t2000 t2000 requested a review from daMihe as a code owner December 27, 2021 00:55
@t2000 t2000 force-pushed the fixICalenderTimeout branch from 26ab7da to cdbc178 Compare December 27, 2021 01:29
@lolodomo lolodomo added the bug An unexpected problem or unintended behavior of an add-on label Dec 27, 2021
@daMihe
Copy link
Contributor

daMihe commented Dec 27, 2021

is request.abort(...) in any case allowed? Is it exception safe?

EDIT: abort seems to stop a currently sending request. As the exceptions are thrown when the request has failed, i'm not sure whether this fix fixes anything. I did not find a corresponding section in the Jetty HttpClient docs.

@t2000
Copy link
Contributor Author

t2000 commented Dec 27, 2021

@daMihe Thanks for having a look at this PR.

Please have a look at what the implementation of abort does. For sure it does not fix that the Exception occurs in the first place, like a time out... However, it for sure fixes what happens afterwards. If you do not abort the request it will fill up the queue and eventually block all further requests. And this is what is fixed in this PR.

@daMihe
Copy link
Contributor

daMihe commented Dec 27, 2021

I have to say, i was up to now not able to understand where the queue is filling up (just a guess: send(...) does not fail itself and so the request does not get aborted). Please point me to the right docs or relevant jetty code - i want to understand this.

However, the synchronous send-method also does the aborts, so i would accept this fix, as it's near the jetty code. Please just give me a chance to understand by pointing me to the right doc or source.

@t2000
Copy link
Contributor Author

t2000 commented Dec 28, 2021

As I mentioned in my #9827 (comment)

2021-12-22 18:13:47.571 [DEBUG] [g.icalendar.internal.handler.PullJob] - ExecutionException message is: Max requests queued per destination 1024 exceeded for HttpDestination[https://<HOST OF URL>]@7fc1029,queue=1024,pool=DuplexConnectionPool@50812a46[c=0/2/2,a=2,i=0,q=1024]

That is the evidence of that queue filling up for me.

The mentioned aborts stack trace to the queue clearing is as follows:
HttpConversation.abort()
then
HttpExchange.abort() and in there:

if (destination.remove(this)) where destination is the mentioned queue.

And yes, all I am fixing is that the queue doesn't get filled up, so whenever such a TimeoutException, ExecutionException or InteruptedException occurs, there is at least the chance that at the next attempt it can succeed. Without the clearing we will block the shared httpclient for everything in openHAB. And since that queue is based on hostnames there might even be nasty dependencies, like blocking a fully unrelated other binding, just because it also uses the httpclient to talk to the same hostname.

Let me know if you still have questions.

Copy link
Contributor

@daMihe daMihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lolodomo
Copy link
Contributor

lolodomo commented Dec 28, 2021

I am very curious to know if this is properly managed by other bindings.
I find only 4 bindings using InputStreamResponseListener:

  1. icalendar
  2. mielecloud => call abort in the disconnect() method
  3. lutron => do not call abort
  4. remoteopenhab => call abort only if status code is not 200

They should be checked.

But this is not the good search key, other bindings are probably using send asynchronously with other listener class.

Are you really sure that calling abort is necessary in case of exceptions ?
I mean for the iCalendar binding, maybe the problem was only due to requests returning a non 200 code ?

@t2000
Copy link
Contributor Author

t2000 commented Dec 28, 2021

Are you really sure that calling abort is necessary in case of exceptions ?
I mean for the iCalendar binding, maybe the problem was only due to requests returning a non 200 code ?

From what I saw here:
#9827 (comment)

it occurred due to a ExecutionException. Or at least such an exception was thrown when we ran into the issue that the queue was full. So I am not entirely sure if one needs to abort on a "non 200 reply" or also on an exception. My approach is "better safe than sorry" :)

I integrated your changes @lolodomo, please have a look again.

Regarding the other bindings: I agree that its tricky to search for this particular case but IMHO we should take care of these error situations because in the end the shared httpclient will get stuck and block everyone (at least that hostname...).

@lolodomo
Copy link
Contributor

I believe we should have a clear vision of when calling abort() is necessary before starting updating many bindings.

@t2000
Copy link
Contributor Author

t2000 commented Dec 30, 2021

I believe we should have a clear vision of when calling abort() is necessary before starting updating many bindings.

If you want to do that, fine for me.

Regarding the icalendar binding and this PR I think #9827 (comment) clearly shows that it is necessary.

Do you have any other comments about the code? If not, I would appreciate if this gets merged so I can fix my openHAB installation. A not working calendar, or well one that stops working after a server hicup, is certainly nothing I would like to rely on for longer since the reason for it was found and is fixed in this PR. Otherwise, please comment what you still would like to have changed. Thanks.

@lolodomo
Copy link
Contributor

lolodomo commented Dec 30, 2021

As a temporary and fast fix, you can simply build the jar and install it in your addons folder in replacement of the embedded one.
You can even join a link to the jar in this PR for others.

If another maintainer has the correct knowledge of how it should be handled properly, it could be fixed very quickly. I do not know myself and I will have to study the stuff more in details.
@wborn @kaikreuzer @cweitkamp @fwolter : any of you know when calling the abort method ?

@kaikreuzer
Copy link
Member

I don't. I've so far never seen nor used the abort() method...

@cweitkamp
Copy link
Contributor

Me neither. But I do not see a show-stopper here. We have a lot of time to test it until the next release is coming and thus I vote for a quick merge too.

@cweitkamp cweitkamp requested a review from lolodomo December 31, 2021 10:33
@lolodomo
Copy link
Contributor

In this case, let's go for a merge, even if we are not fully sure now that it is the good implementation.

@lolodomo lolodomo merged commit a1df7a8 into openhab:main Dec 31, 2021
@lolodomo lolodomo added this to the 3.3 milestone Dec 31, 2021
@daMihe
Copy link
Contributor

daMihe commented Dec 31, 2021

Just for the record: calling abort(...) does not seem counterproductive as the synchronous send() does it in every case (see my previous comment). Let's see how this works.

@J-N-K
Copy link
Member

J-N-K commented Dec 31, 2021

Did this happen on OH3.2 or OH3.1? It looks similar to jetty/jetty.project#6323 but the Jetty in OH3.2 already contains the fixed version, wile OH3.1 contained an older one.

@t2000
Copy link
Contributor Author

t2000 commented Dec 31, 2021

For the record: This issue and my logs are taken with 3.2.0-SNAPSHOT - Build #2572 so definitely newer than 3.1, but not entirely sure if the jetty fixes you mention are in that snapshot (should be from october/november).

@t2000 t2000 deleted the fixICalenderTimeout branch December 31, 2021 12:06
mischmidt83 pushed a commit to mischmidt83/openhab-addons that referenced this pull request Jan 9, 2022
…b#11866)

* [iCalendar] Fix httpclient getting stuck with broken requests

Fixes openhab#9827

Signed-off-by: Stefan Triller <[email protected]>
Signed-off-by: Michael Schmidt <[email protected]>
moesterheld pushed a commit to moesterheld/openhab-addons that referenced this pull request Jan 18, 2022
…b#11866)

* [iCalendar] Fix httpclient getting stuck with broken requests

Fixes openhab#9827

Signed-off-by: Stefan Triller <[email protected]>
nemerdaud pushed a commit to nemerdaud/openhab-addons that referenced this pull request Jan 28, 2022
…b#11866)

* [iCalendar] Fix httpclient getting stuck with broken requests

Fixes openhab#9827

Signed-off-by: Stefan Triller <[email protected]>
@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/how-to-force-a-refresh-of-a-binding-scheduled-update-time-is-too-high/133991/9

NickWaterton pushed a commit to NickWaterton/openhab-addons that referenced this pull request Apr 27, 2022
…b#11866)

* [iCalendar] Fix httpclient getting stuck with broken requests

Fixes openhab#9827

Signed-off-by: Stefan Triller <[email protected]>
Signed-off-by: Nick Waterton <[email protected]>
andan67 pushed a commit to andan67/openhab-addons that referenced this pull request Nov 6, 2022
…b#11866)

* [iCalendar] Fix httpclient getting stuck with broken requests

Fixes openhab#9827

Signed-off-by: Stefan Triller <[email protected]>
andrasU pushed a commit to andrasU/openhab-addons that referenced this pull request Nov 12, 2022
…b#11866)

* [iCalendar] Fix httpclient getting stuck with broken requests

Fixes openhab#9827

Signed-off-by: Stefan Triller <[email protected]>
Signed-off-by: Andras Uhrin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[iCalendar] suddenly stops working
7 participants