-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible confusing logging on ProxyOperationManager.cs #2379
Comments
Why famous 100ms are there other issues connected to this? |
The main issue to move to collectors from msbuild/.net tool version of coverlet is that issue https://github.com/tonerdo/coverlet/blob/master/Documentation/KnownIssues.md#1-vstest-stops-process-execution-earlydotnet-test |
ah gotcha, it only applies to .net fmw. :) Do you want to PR this change? |
What do you mean?The issue happens also on .net core afaik. |
@nohwnd @AbhitejJohn I have also another proposal, can we increase the timeout to 30 seconds? |
@jakubch1 could you take this on? I am not that aware of what are the possibilities around coverlet and in-proc out-proc data collectors. I don't think increasing the timeout to 30s is the right way to go. But maybe to 1s (or max 3s) would mitigate the issue a bit, and would not add too much teardown time. I thought this was solved for dotnet core based on the link above, but I probably misunderstood it. |
@MarcoRossignoli did you try to use MemoryMappedFile for msbuild case? We discussed this already. This is the best approach here and by this we don't need to increase timeout. |
Not yet, discussed yesterday with @tonerdo, the work will be huge because I think we cannot use the default memory mapped file class provided with .net runtime but to support We put it on roadmap on low priority for now https://github.com/coverlet-coverage/coverlet/blob/master/Documentation/Roadmap.md |
I just wasted four days and found all sorts of issues just because of this 100ms timeout. @nohwnd I want to dig a bit deeper into your reservations around increasing the timeout to 30 seconds. In the success case, when the process finishes quickly enough, no problem, we're not waiting long. But suppose the process is doing some expensive teardown, or there's just a fluke (like hard drive/antivirus delay, or random noise from other processes on the machine). Suppose the short (100ms, or 1s) timeout expires. Let's look carefully at what happens. We kill the process. The message that we killed the process is logged into I see absolutely no harm about increasing the timeout to 30 seconds. It will make the worst case successful, and won't impact the regular success case. Remember that books are written about timeouts in computers, and how you can never rely on 100ms actually being 100ms. Your entire process may be paused by the OS for any amount of time randomly. Heck, a random GC can easily create a timeout longer than that. Additionally, this bug should track better user-facing logging in case that timeout does expire. See related: It should be immediately obvious to the user that what happened isn't that a particular test failed, but we timed out waiting for the host process to terminate. Actually the best thing would be to use procdump in that case after the timeout, forcibly grab a dump of the process and make sure it's published as an AzDO artifact, so users can see what the process was doing that caused a 30 seconds timeout. At the risk of repeating myself, I can't underscore enough how important it is that we make this set of improvements to the dotnet test experience on CI. It will result in more stable pipelines, remove the flakiness and noise and make failures easier to diagnose. It's hard to measure the amount of pain the current unfortunate constellation of defects is causing to engineers using dotnet test in CI. @nohwnd if you feel like any of this work doesn't fit in your current schedule, please ping me and I will ensure this gets escalated to whatever level is necessary to prioritize this as an absolute top priority. |
I also think that this line of code will be run if the process has exited before the timeout, which seems wrong: vstest/src/Microsoft.TestPlatform.CrossPlatEngine/Client/ProxyOperationManager.cs Line 228 in d10bcbb
|
It should check the result of the |
Assuming the worst case actually is 30 seconds. 🙂 I am aware of 100 ms not being exactly 100 ms for a thread. And I even agree with this in a pipeline scenario. There it might be reasonable to wait 30s for a process to terminate, to give it a better chance of terminating. Unfortunately 30s is forever in interactive scenario like when you click Stop button in VS TestExplorer. There you don't want to be waiting 30s before seeing anything happen. That is why I was recommending to increase to 1-3s. This gives the process a lot of time to teardown, while keeping the interactive user on the verge of "this is stuck". It would also be nice to give the extensions a better infra where they could report what is happening, and we would show it in case of abort. Problem is how do we get it from the process without spending a lot of resources on "just" logging. Would be awesome if we could have a post-mortem view, that would be more verbose than what you would normally be interested in. This 100ms should also not be problem anymore for coverlet (which I think was crashing your build as well, together I think with the Debug.Assert coming from the code) because newer coverlet is now an in-proc data collector to avoid this problem.
You are right. Building a fix. |
Test host process crashed <- yeah that is a whole another story, both blame and vstest console should report back to the reporter that they killed testhost. I originally attempted to get this fixed in the blame, but doing it would have required more changes that I was comfortable with at that time. IIRC. |
I see, now that I realize that this runs in interactive and IDE scenarios, 30 seconds is of course, way too much. I was missing this detail. Still, the CI case needs to be absolutely robust. You saw how flaky the CI can get because of this issue. Perhaps this should be configurable and CI should pass a custom timeout like 1 minute. |
Usually when we've some issue with coverlet msbuild or .NET tool driver we enable logging to understand if vstest plat killed process after famous 100ms.
BTW seem that logging is wrong and also in case of clean shutdown timeout log is emitted
vstest/src/Microsoft.TestPlatform.CrossPlatEngine/Client/ProxyOperationManager.cs
Lines 202 to 236 in d10bcbb
I think that log emitted in line 228 should be emitted only if
testHostExited
event was not signaled.The text was updated successfully, but these errors were encountered: