Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI testing randomly getting stuck in CI (OSOE-464) #228

Open
7 of 9 tasks
Piedone opened this issue Nov 11, 2022 · 92 comments
Open
7 of 9 tasks

UI testing randomly getting stuck in CI (OSOE-464) #228

Piedone opened this issue Nov 11, 2022 · 92 comments
Assignees
Labels
bug Something isn't working

Comments

@Piedone
Copy link
Member

Piedone commented Nov 11, 2022

See e.g. this build. The NuGetTest UI testing of OSOCE got stuck and thus timed out in 30 minutes, while all the other builds worked fine. Rerunning just that build fixed the issue. This problem started way before these recent examples and the Orchard Core 1.5 upgrade. Reruns resolve the issue every time.

Other examples unrelated to the 1.5 upgrade: here, here, here, here.

Troubleshooting this is made harder by it apparently not being possible to finish the test-dotnet action and upload artifacts when the build is canceled, see Lombiq/GitHub-Actions#77.

It seems this bug didn't solve itself in the end: Lombiq/Open-Source-Orchard-Core-Extensions#126

Jira issue

See comments for notes.

To be done:

Stuck tests outside the scope of blame-hang

Comment OS Runner Build
#228 (comment) Windows standard NuGet
#228 (comment), #228 (comment) Windows standard root
#228 (comment) Windows standard NuGet
@Piedone Piedone added the bug Something isn't working label Nov 11, 2022
@github-actions github-actions bot changed the title OSOCE NuGetTest UI testing randomly getting stuck under Windows OSOCE NuGetTest UI testing randomly getting stuck under Windows (OSOE-464) Nov 11, 2022
@Piedone Piedone self-assigned this Nov 12, 2022
@Piedone
Copy link
Member Author

Piedone commented Nov 12, 2022

I couldn't reproduce this yet with 140 builds, see the https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/workflows/build-and-test.yml?query=branch%3Aissue%2FOSOE-464. However, the currently added logging shows the details of UI test execution, even if the build is canceled due to a timeout, see here.
image

So, I think we should merge the new logging, and wait for the issue to randomly arise again. At that point, we'll be able to determine the cause better.

@Piedone
Copy link
Member Author

Piedone commented Nov 16, 2022

Notes:

Until now, This happened exclusively with the NuGetTest build under Windows. This is a new case that might be the same: With the new, streaming test logging, this failed for the root solution and under Ubuntu (i.e. the polar opposite...).

The log here shows that out of the 6 test projects, only tests for 2 were run: Lombiq.TrainingDemo.Tests and Lombiq.Tests.UI.Samples. There, apparently, all 26 tests have run:

2022-11-16T11:34:32.5767718Z Results File: /home/runner/work/Open-Source-Orchard-Core-Extensions/Open-Source-Orchard-Core-Extensions/test/Lombiq.UITestingToolbox/Lombiq.Tests.UI.Samples/TestResults/test-results.trx
2022-11-16T11:34:32.5768672Z 
2022-11-16T11:34:32.5816191Z Test Run Successful.
2022-11-16T11:34:32.5816903Z Total tests: 26
2022-11-16T11:34:32.5818566Z      Passed: 26
2022-11-16T11:34:32.5820511Z  Total time: 4.9606 Minutes

The following tests I checked and indeed have run (and they all passed) before the build timed out. There are actually 28 "Finishing execution of" runs for the supposedly 26 tests!

  • AccessibilityTest.FrontendPagesShouldBeAccessible
  • BasicVisualVerificationTests.VerifyHomePage x2, for Chrome and Edge
  • BasicTests.LoginShouldWork
  • ErrorHandlingTests.ServerSideErrorOnLoadedPageShouldHaltTest
  • EmailTests.SendingTestEmailShouldWork
  • MonkeyTests.TestAdminPagesAsMonkeyRecursivelyShouldWorkWithAdminUser
  • SqlServerTests.TogglingFeaturesShouldWorkWithSqlServer
  • TenantTests.CreatingTenantShouldWork
  • AzureBlobStorageTests.AnonymousHomePageShouldExistWithAzureBlobStorage
  • BasicTests.TogglingFeaturesShouldWork
  • ErrorHandlingTests.BrowserLogsShouldPersist
  • MonkeyTests.TestAdminBackgroundTasksAsMonkeyRecursivelyShouldWorkWithAdminUser
  • BasicOrchardFeaturesTests.BasicOrchardFeaturesShouldWork - Two "Finishing execution" messages since it failed once but was retried.
  • SqlServerTests.AnonymousHomePageShouldExistWithSqlServer
  • BasicTests.ShortcutsShouldWork
  • AzureBlobStorageTests.TogglingFeaturesShouldWorkWithAzureBlobStorage
  • ErrorHandlingTests.ErrorDuringSetupShouldHaltTest
  • BasicVisualVerificationTests.VerifyNavbar x2, for Chrome and Edge
  • BasicTests.AnonymousHomePageShouldExist
  • MonkeyTests.TestCurrentPageAsMonkeyRecursivelyShouldWorkWithAnonymousUser
  • ErrorHandlingTests.ClientSideErrorOnLoadedPageShouldHaltTest
  • DatabaseSnapshotTests.BasicOrchardFeaturesShouldWorkWithExistingDatabase - Two "Finishing execution" messages but not because fails and retries. It was actually started twice, 1:15 mins apart. This is because to test resume from an existing DB, it starts a test within a test.
  • MonkeyTests.TestCurrentPageAsMonkeyShouldWorkWithConfiguredRandomSeed
  • BasicVisualVerificationTests.VerifyBlogImage

In the end, all tests have run from the project, just a couple of them more than once. The above list contains 24 tests + 2 Edge executions (normal until now) + 2 duplicated "Finishing execution" messages.

Log:
9_Tests.txt

So, nothing strange here. It's just that nothing happens after the "Total time" line. Then we should have a "Test successful" or "Test failed". But, this never arrives, the script is stuck between this line and this one. I guess it doesn't get stuck on if ($?) but somewhere in dotnet test.

@Piedone
Copy link
Member Author

Piedone commented Nov 16, 2022

dotnet test hanging is not an issue just for us.

BTW @dministro since this hanging seems to correlate with #186, does something ring a bell?

@dministro
Copy link
Member

Yes, i faced deadlock in NuGetTests some times on TC, but newer GHA. I investigated it, and replaced all lock() statements inside async methods using SemaphorSlim and thread safe collections. I can collect all relevant commits.
It was really simple to reproduce locally, I started the NuGetTests like TC does, and usually the deadlock come up after 2-3 runs. After I removed lock() the way as I mentioned above, the issue has gone locally and on TC too.

@Piedone
Copy link
Member Author

Piedone commented Nov 16, 2022

So you could repro by basically running dotnet test for all test DLLs (what TC did)? Simply repeated runs of the testing script running in GHA locally doesn't do anything strange on my machine.

I'm testing disabling node reuse under Lombiq/Open-Source-Orchard-Core-Extensions#273. We'll see if it reliably fixes the issue, but what's for sure, is that for the root solution, build takes about twice as much (~4 minutes vs ~8)... If it helps, then we can check if removing any of the switches makes it faster but still retains the fix (or perhaps only leave the env var for the test-dotnet action.

@Piedone
Copy link
Member Author

Piedone commented Nov 17, 2022

I just remembered about dotnet build-server shutdown. We don't need no-node reuse throughout, it's enough to just shut down the build servers after dotnet build. So, builds are not slower.

@dministro
Copy link
Member

So you could repro by basically running dotnet test for all test DLLs (what TC did)?

Yes exactly.

@Piedone
Copy link
Member Author

Piedone commented Nov 17, 2022

OK, thanks.

@Piedone
Copy link
Member Author

Piedone commented Nov 28, 2022

Happened again here.

So, until now, disabled node reuse (or rather, shutting down build servers) and input redirection didn't help.

@Piedone
Copy link
Member Author

Piedone commented Nov 30, 2022

Perhaps the problem will get some attention from Microsoft now? microsoft/vstest#2080 (comment)

@Piedone
Copy link
Member Author

Piedone commented Dec 5, 2022

ui-test-parallelism: 0 didn't help, since the issue happened again here. I'll try xUnit "maxParallelThreads": -1 though that needs some restructuring of how our GHA workflows handle configuration.

@Piedone
Copy link
Member Author

Piedone commented Dec 8, 2022

Apparently, the xUnit "maxParallelThreads": -1 trick didn't make the issue go away, but it did change it. This run and even its rerun timed out, during Tests as usual. However, while formerly the step got stuck after finishing all tests of Lombiq.OSOCE.NuGet.Tests.UI\, now it was in the middle of the test execution:

Timing out after starting executing tests in the project but no output from the actual tests:

image

Getting stuck between tests:

image

Interesting, and this makes a case for the "fix for snyc-over-async in WebApplicationFactory" attempt, since this can be related to what the tests do (as opposed to just what dotnet test does).

I'll wait for some more data to come in.

@Piedone
Copy link
Member Author

Piedone commented Dec 9, 2022

@0liver you told about this one happening, thanks.

Since this wasn't a NuGetTest build, nor did it use the -1 xUnit config (but 0, as before, since that came with ui-test-parallelism before too; see here), it also suggests that we need the WebApplicationFactory fix. However, it didn't start test execution at all, not even reaching this line. Most possibly it got stuck here, which we can do nothing about, since it's not running tests yet at all...

@Piedone
Copy link
Member Author

Piedone commented Dec 11, 2022

This run got stuck between two tests. This is suggestive of the WebApplicationFactory fix.

@Piedone
Copy link
Member Author

Piedone commented Dec 13, 2022

Now we wait to see if the WebApplicationFactory fix solves the issue.

@Piedone
Copy link
Member Author

Piedone commented Dec 14, 2022

And it doesn't, this build failed. It hung between two tests. So, the xUnit "maxParallelThreads": -1 config apparently didn't cause this, i.e. to hang between tests as opposed to between test projects.

I'm out of ideas since I've tried everything that I could think of and that everyone suggested. Perhaps any other suggestions from you @BenedekFarkas @DAud-IcI @dministro @0liver?

@0liver
Copy link
Contributor

0liver commented Dec 14, 2022

This has been beyond me for quite some time. So, unfortunately, no.

@BenedekFarkas
Copy link
Member

BenedekFarkas commented Dec 14, 2022

  1. I've started a build in TC, just in case it yields any useful information about timings: https://ci.lombiq.com/buildConfiguration/OrchardExtensions_OSOCE_Developer_PullAndBuild/1143340
  2. We can build an updated runner image and spin up a VM based on it, so you can monitor the runner machine real-time.
  3. For local testing, try changing your BIOS settings to match the core/thread count of the runner machines.
  4. .NET SDK downgrade? We could stress-test against multiple different versions to see which one(s) produce this problem.

@dministro
Copy link
Member

The only thing that comes to my mind, running the tests locally in a loop until the error comes, then attaching the debugger to the running process. It looks like the issue affects windows runner which means for me, we can reproduce it on windows(locally) so this simplifies it.

What about the distribution between standard and larger runners? Is there a significant difference?
As I see the linked failed builds, most of them are on standard-runner, except the OSOE-84 one.

@Piedone
Copy link
Member Author

Piedone commented Dec 15, 2022

Thank you for your tips! I'll follow up with these if the issue still surfaces. Because I just noticed that the NuGetTest build, that recently failed, wasn't actually using the new code, since I didn't NuGet-publish and update the UI Testing Toolbox... I just did that now, it's in dev, so waiting for results.

Yes, this is by far the most of an issue with the Windows NuGetTest build, which happens on the 2-core standard runners, It much more rarely happens, but sometimes it does, with the full solution builds, also under Ubuntu (I don't know it every happening for the NuGetTest build under Ubuntu though). So yeah, we should be able to reproduce if constrained to a 2-core CPU.

@0liver
Copy link
Contributor

0liver commented Dec 15, 2022

I’ve had Tests hanging in a root build just last night, on the dev branch (Windows 2core): https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/3700330664/jobs/6268653535#step:10:3968

And another root build timed out just now, on a Ubuntu-22.04-4core runner: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/3704714311/jobs/6277678502#step:10:63

@Piedone
Copy link
Member Author

Piedone commented Dec 15, 2022

Thank you. This makes it clear that the WebApplicationFactory change didn't fix this. I'll look into this more, then.

@Piedone
Copy link
Member Author

Piedone commented Dec 21, 2022

@0liver
Copy link
Contributor

0liver commented Dec 27, 2022

Let's hit this:
image

I think we could become a GitHub sponsor for this guy - we're using his Analyzers in our daily work already, and he provides some really valuable content.

@Piedone
Copy link
Member Author

Piedone commented Dec 27, 2022

We'll definitely consider him as well during our yearly review of OSS projects/contributors to sponsor.

@Piedone
Copy link
Member Author

Piedone commented Jan 3, 2023

@meziantou would you be open to investigating and fixing this dotnet test hang as contract work? I don't see a way to get in touch with you privately, so if you would, then please write me an e-mail, it's visible under my profile.

@Piedone
Copy link
Member Author

Piedone commented Jan 24, 2024

So, what's the plan @dministro @sarahelsaig? Builds continue to randomly fail in dev.

@Piedone
Copy link
Member Author

Piedone commented Feb 4, 2024

There's a hang here with a huge hang dump @dministro: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/7776721615?pr=691 Download it before it gets cleaned up.

@Piedone
Copy link
Member Author

Piedone commented Feb 4, 2024

BTW now I expect more of these, since we'll have to run issue branch Windows builds on standard runners too after GitRunners shutting down and us having to use Standard Runners for issue branch builds too: Lombiq/GitHub-Actions#320

@Piedone
Copy link
Member Author

Piedone commented Feb 4, 2024

@Piedone
Copy link
Member Author

Piedone commented Feb 4, 2024

Nah, these are red herrings, and the build now runs to completion after changing dotnet-test-process-timeout to 720000.

@dministro
Copy link
Member

Surprising: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/7791752953

The only running process: C:\Program Files\PowerShell\7\pwsh.exe -c "dotnet test --configuration Debug --nologo --logger 'trx;LogFileName=test-results.trx' --logger 'console;verbosity=detailed' --verbosity quiet --filter FullyQualifiedName!~SecurityScanningTests test\Lombiq.UITestingToolbox\Lombiq.Tests.UI.Samples\Lombiq.Tests.UI.Samples.csproj 2>&1"
Running time: 00:12:00.1811181
image

We will see.

@dministro
Copy link
Member

This was finished in 7 min the dotnet-test-process-timeout: 360000 which is 6 min.

@Piedone
Copy link
Member Author

Piedone commented Feb 12, 2024

@dministro
Copy link
Member

Looks like an actual hang: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/7877646651?pr=699

Not really, this is monkey test again:
image

The chrome mem dump is extremely big.
image

@Piedone
Copy link
Member Author

Piedone commented Feb 13, 2024

I see. I made that test inspect only a single page under https://github.com/Lombiq/UI-Testing-Toolbox/pull/342/files#diff-56d06a46558b30038bf02e6c2e823aa208dc42bd2cb8125aab592bf4a8ef67a9 as opposed to randomly potentially the whole admin, so that shouldn't happen anymore.

@Piedone
Copy link
Member Author

Piedone commented Mar 9, 2024

Invoke-SolutionTests buffers the test output, so no output only means that the test process didn’t finish, not that it didn’t produce any output while running, but we don’t see that. This causes an issue for me in another project, where I try to debug a test-running workflow timing out, and I don't see what's happening after the tests start (only that there's no output before the workflow times out).

Can the output be streaming instead?

@dministro
Copy link
Member

I'm not sure. I'll check it in the upcoming days.

@Piedone
Copy link
Member Author

Piedone commented Mar 9, 2024

Thanks!

@dministro
Copy link
Member

@Piedone
Copy link
Member Author

Piedone commented Mar 19, 2024

Awesome!

@Piedone
Copy link
Member Author

Piedone commented Mar 28, 2024

@Piedone
Copy link
Member Author

Piedone commented Mar 28, 2024

So, I guess this can be closed, but please see my previous message.

@dministro
Copy link
Member

Does this add new info? https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/8473959367?pr=720

Not really, here are the execution times per method from 3_Build and Test - root solution (larger runners) _ Build and Test (buildjet-4vcpu-ubuntu-2204).txt.

Method name Execution time
Lombiq.OSOCE.Tests.UI.Tests.UITestingToolboxTests.SecurityShortcutsTests.AddUserToFakeRoleShouldThrow 0:00:07
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorUIKitShowcaseTests.UIKitShowcasePageShouldBeCorrect 0:00:23
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorEmailQuotaTests.EmailQuotaShouldBlockEmailsOverLimitAndWarn 0:00:18
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorVueTests.RecipeDataShouldBeDisplayedCorrectly 0:00:26
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorAsyncEditorTests.DemoContentItemAsyncEditorShouldWork 0:00:31
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureContentSets 0:00:05
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorApplicationInsightsTests.ApplicationInsightsTrackingInOfflineOperationShouldWork 0:00:02
Lombiq.OSOCE.Tests.UI.Tests.ThemeTests.BehaviorBaseThemeTests.ThemeFeaturesShouldWork 0:00:08
Lombiq.OSOCE.Tests.UI.Tests.UITestingToolboxTests.WorkflowShortcutsTests.GenerateHttpEventUrlShouldWork 0:00:00
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorMediaStorageManagementTests.MediaQuotaShouldWork 0:00:02
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorTenantManagementTests.TenantShellSettingsEditorShouldSaveSettings 0:00:05
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorMaintenanceTests.AddSiteOwnerPermissionToRoleMaintenanceTaskShouldBeExecutedSuccessfully 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkAnonymous 0:00:02
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorChartJsTests.RecipeDataShouldBeDisplayedCorrectly 0:00:04
Lombiq.OSOCE.Tests.UI.Tests.VisualVerificationTests.VisualVerificationTests.VerifyHomePageAndLayout 0:00:06
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorLoginAsAnybodyTests.SwitchingUserShouldWorkCorrectly 0:00:13
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorEnvironmentRobotsTests.RobotsMetaTagShouldBePresent 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorFeaturesGuardTests.ConditionallyEnabledFeaturesShouldWorkCorrectlyOnTenants 0:00:21
Lombiq.OSOCE.Tests.UI.Tests.UITestingToolboxTests.CustomAdminPrefixTests.NavigationWithCustomAdminPrefixShouldWork 0:00:02
Lombiq.OSOCE.Tests.UI.Tests.ThemeTests.BehaviorMediaThemeTests.MediaThemeShouldWorkWhenDeployed 0:00:04
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorDataTablesTests.RecipeDataShouldBeDisplayedCorrectly 0:00:05
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.RetrievalBuildVersionDisplayTests.BuildVersionShouldBeBeDisplayedCorrectly 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorJsonEditorTests.JsonEditorShouldWorkCorrectly 0:00:10
Lombiq.OSOCE.Tests.UI.Tests.UITestingToolboxTests.SecurityShortcutsTests.AddUserToRoleShouldWork 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.LibraryTests.BehaviorOrchardCoreApiClientTests.OrchardCoreApiClientShouldWork 0:00:07
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorEmailQuotaTests.EmailQuotaShouldNotBlockEmailsWhenDifferentHostIsUsedThanOriginalFromConfig 0:00:05
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorVueTests.QrCardScanShouldWorkAsync 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureCodeGeneration 0:00:11
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorAsyncEditorTests.DemoFrontEndAsyncEditorShouldWork 0:00:22
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkAdmin 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorMaintenanceTests.ChangeUserSensitiveContentMaintenanceTaskShouldBeExecutedSuccessfully 0:00:15
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorEnvironmentRobotsTests.RobotsMetaTagShouldBeMissingWithoutConfiguration 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorFeaturesGuardTests.ForbiddenFeaturesShouldNotBeActivatableOnTenants 0:00:05
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorLoginAsAnybodyTests.PermissionCheckShouldWorkCorrectly 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.UITestingToolboxTests.SecurityShortcutsTests.AllowFakePermissionToRoleShouldThrow 0:00:00
Lombiq.OSOCE.Tests.UI.Tests.ThemeTests.BehaviorMediaThemeTests.MediaThemeShouldWorkLocally 0:00:04
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorVueTests.QrCardScanShouldReportNotFoundAsync 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.FormConsentCheckboxShouldWork 0:00:03
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorMaintenanceTests.MaintenanceTaskShouldBeExecutedSuccessfully 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorEnvironmentRobotsTests.RobotsMetaTagShouldBeMissing 0:00:01
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.RegistrationConsentCheckboxShouldWork 0:00:04
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorIdleTenantsTests.ShuttingDownIdleTenantsShouldWork 0:02:16
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureWidgets 0:00:35
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkWithRazorAndLiquidBasedThemes 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkWithRazorAndLiquidBasedThemes 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureFlow 0:00:12
Lombiq.OSOCE.Tests.UI.Tests.ModuleTests.BehaviorWalkthroughsTests.WalkthroughsShouldWorkCorrectly 0:07:57
Lombiq.OSOCE.Tests.UI.Tests.ThemeTests.BehaviorBlogBaseThemeTests.ThemeWithoutSetupShouldWork 0:00:06
Sum 0:16:55

Maybe we can increase the timeout again.

@sarahelsaig
Copy link
Member

Maybe we can increase the timeout again.

That failure was from just before I merged the latest PR, so probably not yet.

@Piedone
Copy link
Member Author

Piedone commented Mar 30, 2024

Ah OK, thanks.

Do we still need the build-create-binary-log and dotnet-test-process-timeout in both OSOCE workflows? Please also notify the team about what to do with these for other projects (at least Lombiq Tenants is affected).

@Piedone
Copy link
Member Author

Piedone commented Apr 2, 2024

@dministro
Copy link
Member

Almost finished. Using sed -En "s/(.*Finishing execution of )(([0-9a-zA-Z_]+[\.]{0,1})*), total time: (([0-9]{2}[:]{0,1}){3})(.*)/\2,\4/p" '.\4_Build and Test Windows - NuGetTest solution _ Build and Test (windows-2022).txt'

Method name Execution time
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ThemeTests.BehaviorThemeTests.BaseThemeShouldWork 0:03:44
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ThemeTests.BehaviorMediaThemeTests.MediaThemeShouldRenderTemplatesFromMediaLibrary 0:00:07
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.RetrievalBuildVersionDisplayTests.BuildVersionShouldBeBeDisplayedCorrectly 0:00:04
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorVueTests.RecipeDataShouldBeDisplayedCorrectly 0:04:00
Lombiq.OSOCE.NuGet.Tests.UI.Tests.BehaviorFileUploadTests.SampleFilesShouldBeAccessible 0:00:05
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorFeaturesGuardTests.ConditionallyEnabledFeaturesShouldWorkCorrectlyOnTenants 0:04:04
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorChartJsTests.RecipeDataShouldBeDisplayedCorrectly 0:00:11
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorMaintenanceTests.ChangeUserSensitiveContentMaintenanceTaskShouldBeExecutedSuccessfully 0:00:14
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorEnvironmentRobotsTests.RobotsMetaTagShouldBeMissing 0:00:05
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorApplicationInsightsTests.ApplicationInsightsTrackingInOfflineOperationShouldWork 0:00:07
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorTenantManagementTests.TenantShellSettingsEditorShouldSaveSettings 0:00:19
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureFlow 0:00:20
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorJsonEditorTests.JsonEditorShouldWorkCorrectly 0:00:20
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorLoginAsAnybodyTests.SwitchingUserShouldWorkCorrectly 0:00:09
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorDataTableTests.DataTableShouldWork 0:00:12
Lombiq.OSOCE.NuGet.Tests.UI.Tests.BasicOrchardFeaturesTests.BasicOrchardFeaturesShouldWork 0:00:20
Lombiq.OSOCE.NuGet.Tests.UI.Tests.LibraryTests.BehaviorOrchardCoreApiClientTests.OrchardCoreApiClientShouldWork 0:00:16
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorEmailQuotaTests.EmailQuotaShouldBlockEmailsOverLimitAndWarn 0:01:59
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.IdleTenantTests.ShuttingDownIdleTenantsShouldWork 0:06:00
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorAsyncEditorTests.DemoFrontEndAsyncEditorShouldWork 0:00:21
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkAnonymous 0:00:10
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorFeaturesGuardTests.ForbiddenFeaturesShouldNotBeActivatableOnTenants 0:00:11
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorEnvironmentRobotsTests.RobotsMetaTagShouldBeMissingWithoutConfiguration 0:00:09
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorLoginAsAnybodyTests.PermissionCheckShouldWorkCorrectly 0:00:06
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorAsyncEditorTests.DemoContentItemAsyncEditorShouldWork 0:00:41
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureWidgets 0:00:54
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorEnvironmentRobotsTests.RobotsMetaTagShouldBePresent 0:00:05
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorEmailQuotaTests.EmailQuotaShouldNotBlockEmailsWhenDifferentHostIsUsedThanOriginalFromConfig 0:00:55
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorHelpfulExtensionsTests.FeatureCodeGeneration 0:00:04
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkAdmin 0:00:16
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.RegistrationConsentCheckboxShouldWork 0:00:09
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorWalkthroughsTests.WalkthroughsShouldWorkCorrectly 0:02:35
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkWithRazorAndLiquidBasedThemes 0:00:15
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.ConsentBannerShouldWorkWithRazorAndLiquidBasedThemes 0:00:15
Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.BehaviorPrivacyTests.FormConsentCheckboxShouldWork 0:00:28
Sum 0:30:10

I would highlight this one: Lombiq.OSOCE.NuGet.Tests.UI.Tests.ModuleTests.IdleTenantTests.ShuttingDownIdleTenantsShouldWork -> 0:06:00
In this test happends nothing, only waits 1 or 2 mins for a log message. No client activities, no calculations...

Do we have any hisorical data about execution time for each methods?

@dministro
Copy link
Member

dministro commented Apr 2, 2024

Ah OK, thanks.

Do we still need the build-create-binary-log and dotnet-test-process-timeout in both OSOCE workflows? Please also notify the team about what to do with these for other projects (at least Lombiq Tenants is affected).

Lets keep the current configuration for a whilw (1-2 months). I created an issue for this to keep track. Lombiq/Open-Source-Orchard-Core-Extensions#736

@Piedone
Copy link
Member Author

Piedone commented Apr 3, 2024

Here's the log of ShuttingDownIdleTenantsShouldWork (from the workflow output):

ShuttingDownIdleTenantsShouldWork output.txt

Note these lines:

2024-04-02T16:58:03.2535678Z  2024-04-02 16:48:55.0193 - Starting waiting for the setup operation.
2024-04-02T16:58:03.2537135Z  2024-04-02 16:52:09.3170 - Finished waiting for the setup operation.

(the first timestamp is when the test's log was flushed to the output, the second timestamp matters)

I.e., the test started together with all the others, then waited ~3 minutes for the common setup to be finished, by the test that happened to start that the first time (ExecuteTestAfterSetupAsync makes sure that the setup is only run once for matching tests, then snapshot, and re-used, but this async waiting adds to the wall clock time of the other tests). The test, after setting up another tenant and waiting for it to idle shutdown, finished at 16:54:55, i.e. within ~3,5 minutes, which is expected.

So, this looks normal, though the setup itself was slow: it should finish within seconds, just the tenant setup finished within 5s. However, I'd chalk this up to a random slowness of the GitHub-hosted runner, which is a random Azure VM running Windows (which is like two times slower for these already).

@Piedone Piedone changed the title OSOCE NuGetTest UI testing randomly getting stuck under Windows (OSOE-464) UI testing randomly getting stuck in CI (OSOE-464) Jul 17, 2024
@Piedone
Copy link
Member Author

Piedone commented Jul 17, 2024

So, this doesn't seem to be fixed. @dministro let's continue the investigation here instead of Lombiq/Open-Source-Orchard-Core-Extensions#736 since this is where the rest of the stuff is.

As mentioned under Lombiq/Open-Source-Orchard-Core-Extensions#80, Lombiq/GitHub-Actions#373 can help you diagnose this.

This run here has the same issue, but with test diag logs: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9980235741

This looks telling:

TpTrace Verbose: 0 : 19163, 10, 2024/07/17, 19:33:42.634, 1134212959530, vstest.console.dll, TestRequestSender.EndSession: Sending end session.
TpTrace Verbose: 0 : 19163, 10, 2024/07/17, 19:33:42.634, 1134213101092, vstest.console.dll, ProxyOperationManager.Close: waiting for test host to exit for 100 ms
TpTrace Warning: 0 : 19163, 10, 2024/07/17, 19:33:42.734, 1134313269943, vstest.console.dll, ProxyOperationManager: Timed out waiting for test host to exit. Will terminate process.

Looks related: microsoft/vstest#2952. Especially microsoft/vstest#2952 (comment) and the "infamous 100ms bug". That looks like this line.

Setting the VSTEST_TESTHOST_SHUTDOWN_TIMEOUT to 10s (like this to 60s, see this run) still causes the process to hang, despite the config actually being uses:

TpTrace Verbose: 0 : 19393, 7, 2024/07/17, 20:18:39.861, 790229310102, vstest.console.dll, ProxyOperationManager.Close: waiting for test host to exit for 10000 ms
TpTrace Warning: 0 : 19393, 7, 2024/07/17, 20:18:49.861, 800229492443, vstest.console.dll, ProxyOperationManager: Timed out waiting for test host to exit. Will terminate process.

With 60s it still hangs. So, it seems that this is an actual hang. The last message is always "Finishing execution of Lombiq.OSOCE.Tests.UI.Tests.ThemeTests.BehaviorBlogBaseThemeTests.ContentMenuItemShouldWorkCorrectly".

@sarahelsaig check this out, it might help you too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants