Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since upgrading icinga-powershell-framework to 1.12.3, timeouts occur more often #765

Open
TheCry opened this issue Dec 4, 2024 · 10 comments
Assignees
Labels
Bug There is an issue present Investigation The team is looking into the cause of the issue

Comments

@TheCry
Copy link

TheCry commented Dec 4, 2024

We With CVE-2024-49369, we have updated all window servers to the latest icinga-powershell-framework (1.12.3) and Icinga Client (2.14.3). Now we notice that different servers get a timeout every now and then during the simple check "Invoke-IcingaCheckTimeSync".
We tested 2 different servers (Windows 2019 with the same function).
For one of them, the check runs through within 2-3 seconds and for the other, the check takes at least 20 seconds until the timeouts (30 seconds).
This is what the event log says:
warning/Process: Killing process group 6444 ('C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoProfile -NoLogo -ExecutionPolicy ByPass -C "try { Use-Icinga -Minimal; } catch { Write-Output 'The Icinga PowerShell Framework is either not installed on the system or not configured properly. Please check https://icinga.com/docs/windows for further details'; Write-Output 'Error:' $($_.Exception.Message)Components:rn$( Get-Module -ListAvailable 'icinga-powershell-*' )rn'Module-Path:'rn$($Env:PSModulePath); exit 3; }; Exit-IcingaExecutePlugin -Command 'Invoke-IcingaCheckTimeSync' " -Server '***.***.***.***' -Warning -1s:1s -Critical -2s:2s -Verbosity 2') after timeout of 30 seconds
Before the update, these problems never existed. Now I hope that you can find something, because we don't have a approach.
Thank you
Greeting
Sascha

@bfenda
Copy link

bfenda commented Dec 11, 2024

Hi together,

I would also add to it. Since updating to 2.14.3 especially we see various checks on Icinga Agents, frequently run into timeouts. The Timeout is set to 60 seconds. Normally the checks takes about 10s, but as said, the check runs into a timeout very often leading to the service being in a flapping state:

image

I maybe found a related message to it in the windows Eventlog.
image

Thank you,

Best regards,
Brian

@BTT-Monitoring
Copy link

We're encountering this issue across multiple production environments affecting several of our customers since upgrading IFW.
Given the widespread impact on our production systems, we would greatly appreciate any insights or potential solutions. We're happy to provide additional details or assist with debugging if needed.
Thank you for looking into this!

@Raupueppi
Copy link

Since the update of icinga-powershell-framework to 1.12.3 we observe the same problem! The checks run into timeouts.
Can you please take a look at this?

@bieberjz
Copy link

Ever since updating the Icinga PowerShell Framework to version 1.12.3, we've been encountering the same issue—our checks are timing out

@BTMichel
Copy link

We are also encountering this issue, lots of timeouts since we updated the icinga powershell framework.
The only viable fix/workaround currently is a rollback to the previous version, which we definitely would like to avoid.
@LordHepipud are there any potential other fixes or workarounds which we could at least manually apply?

@dbecker1234
Copy link

Same here!
Might be coincidentally, but at least in our environment only two machines are affected, both running Server 2019 with domain controller services (one host physical and the other virtual on vSphere).

@LordHepipud LordHepipud self-assigned this Jan 29, 2025
@LordHepipud LordHepipud added Bug There is an issue present Investigation The team is looking into the cause of the issue labels Jan 29, 2025
@LordHepipud
Copy link
Collaborator

Thank you for all the reports. I have already tried to take a look on this case and can't really reproduce the issue. Based on the provided issues, all of these events happend after upgrading to v1.12.3 - is this correct?

I assume on all those machines the REST-Api of Icinga for Windows is being used (as mentioned in the logs).

To me it seems weird, that the internal threads are being terminated because an timeout. This would in general mean, the thread hung for more than 3 minutes (the internal Icinga for Windows threshold for determining if a thread is still working or frozen).

I had a similar occurrence on a customer environment this week, while here no packets were transmitted for the Icinga for Windows socket. Instead of the thread being killed there, the socket reader terminated the connection after 5 seconds because no packets were send.

I'm not sure if these errors are related, but something strange seems to be going on.

@LordHepipud
Copy link
Collaborator

Just another question: Which version of Icinga for Windows was working properly previously?

@LordHepipud
Copy link
Collaborator

The only real change we made with v1.12 was ensuring that network data is read properly to the end in call cases:

#706

The other remaining topics for certificate handling can be ignored I assume, as we already establish successful connections.

Can someone please try to revert the handling of the provided PR #706 on an affected machine and check the results?

@TheCry
Copy link
Author

TheCry commented Jan 30, 2025

For us, it was definitely different versions smaller than 1.12.3.
In some cases, the client had to be completely reinstalled. As I wrote above, this only occurred after we had brought everything up to date through the CERT. Downgrading Powershell didn't have any effect here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug There is an issue present Investigation The team is looking into the cause of the issue
Projects
None yet
Development

No branches or pull requests

8 participants