Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMException occuring when kernel events are collected #1723

Closed
rbanks54 opened this issue Oct 5, 2022 · 15 comments
Closed

COMException occuring when kernel events are collected #1723

rbanks54 opened this issue Oct 5, 2022 · 15 comments

Comments

@rbanks54
Copy link

rbanks54 commented Oct 5, 2022

Up until today, perfview has been working fantastically well, but today I can't get it to start collecting data and I can't diagnose why.

Here's what I'm seeing in the log

Started: Running: C:\src\PdfPig\src\PerformanceTester\bin\Release\net6.0\PerformanceTester.exe...  See log for output.
[Kernel Log: C:\src\perfView\PerfViewData.kernel.etl]
Kernel keywords enabled: Default
Completed: Running: C:\src\PdfPig\src\PerformanceTester\bin\Release\net6.0\PerformanceTester.exe...  See log for output.   (Elapsed Time: 0.185 sec)
Exception Occurred: System.Runtime.InteropServices.COMException (0x800700AA): The requested resource is in use. (Exception from HRESULT: 0x800700AA)
   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.EnableKernelProvider(Keywords flags, Keywords stackCapture)
   at PerfView.CommandProcessor.Start(CommandLineArgs parsedArgs)
   at PerfView.CommandProcessor.Run(CommandLineArgs parsedArgs)
   at PerfView.MainWindow.<>c__DisplayClass20_0.<ExecuteCommand>b__0()
   at PerfView.StatusBar.<>c__DisplayClass22_0.<StartWork>b__0()
An exceptional condition occurred, see log for details.

The command line is

PerfView.exe  "/DataFile:PerfViewData.etl" /BufferSizeMB:256 /StackCompression /Process:"PerformanceTester" /ClrEvents:GCSampledObjectAllocationHigh,Default /NoGui /NoNGenRundown /Merge:False /Zip:False run C:\src\PdfPig\src\PerformanceTester\bin\Release\net6.0\PerformanceTester.exe

Adding /KernelEvents:None to the command line will allow the profiling to work, but without being able to get CPU samples it's not much use running things that way.

I've tried rebooting and running perfview from different folders, but to no avail.

It's happening on 3.0.3 thru 3.0.5

About the only think I can think of is a Windows update that was installed overnight:
Update Stack Package - (Version 1022.921.2011.0) (I'm on the Win 11 beta channel)
I can't see why something related to the Windows update process would cause problems, so I'm doubtful it's the problem. I just can't think of anything else to try.

@rbanks54
Copy link
Author

rbanks54 commented Oct 6, 2022

I've done a little more investigating to try and work out what's going on, but I'm left confused.

Here's a sequence of events...

  1. Reboot the machine
  2. Start v3.0.5 and run a command. Exception occurs. ❌
  3. Start V2022 (16.4 preview) and with the debug configuration selected
  4. Start without debugging. Exception occurs. ❌
  5. Run it again with the debugger attached. It works. ✅
  6. Close VS2022
  7. Start v3.0.5 and run a command. It works. ✅ 😮

I have no idea why a debugger session would sets things right.

I did a little more digging to try and work out where the exception is coming from and it's happening here:

dwErr = ETWKernelControl.StartKernelSession(out m_SessionHandle, properties, PropertiesSize, stackTracingIds, numIDs);

I might be wrong, but from what I see in the decompiled method, the only native calls made inside StartKernelSession are to StartTraceW, LoadLibrary, and StartKernelTrace, none of which throw an ERROR_BUSY (0xAA) result according to the docs I looked at.

@adamsitnik
Copy link
Member

cc @brianrob

@brianrob
Copy link
Member

brianrob commented Oct 6, 2022

Hmmm... this is not something that I've seen before. It's super weird to me that running VS with the debugger would fix it, though there are some actions that VS takes that impact ETW, so I guess anything is possible. A couple of questions:

  1. Have you seen this on multiple machines?
  2. Does this repro on a non-Insider build? It is possible that there is a bug in the OS.

@rbanks54
Copy link
Author

rbanks54 commented Oct 6, 2022

I ran it on a spare machine that had the same insiders build on it, and saw the same issue

On the spare, I rolled back to WIn 11 21H2 and everything now works as expected. I have another machine with 21H2 still on it and it also works correctly.

It's definitely feeling like something in the OS

@brianrob
Copy link
Member

brianrob commented Oct 6, 2022

That it does. Would you mind filing a feedback ticket in Windows for this (hit Windows-F to open the feedback hub)? This is the best way to route this issue. If you let me know what it's called, I'll do my best to see that it's reviewed by the right folks.

@rbanks54
Copy link
Author

rbanks54 commented Oct 7, 2022

Feedback item is titled "COMException when starting a kernel level trace session" (https://aka.ms/AAi9xqw)

For reference, I bumped the spare machine up to the 22H2 GA release. I thought things were working as expected, but just ran into the problem again.

@brianrob
Copy link
Member

brianrob commented Oct 7, 2022

Thanks much @rbanks54! I am going to close this issue in favor of the Feedback ticket. I am working to get eyes on it.

@brianrob brianrob closed this as completed Oct 7, 2022
@rbanks54
Copy link
Author

rbanks54 commented Oct 8, 2022

You'll like this @brianrob, @adamsitnik

I noticed on the spare machine that while it wasn't crashing, I was also regularly seeing no data in the CPU traces.

Digging around online I found an old MS Answers thread where there was a version of the virus definition package causing issues. In the same thread, I noticed some people still had issues after updating but found success by disabling real time protection.

On a hunch, I tried this on the spare machine and the CPU sample data appeared correctly.

I then went to the main machine and did the following:

  1. Ran perfview as per usual, and saw the crash
  2. Disabled real time protection
  3. Restarted perfview and collected data. It worked!
  4. Turned on real time protection again
  5. Ran perfview and the collection still worked!
  6. Ran a "quick scan" in Defender
  7. Tried perfview one more time, and it still worked
  8. I then rebooted and ran the whole sequence again... same result!

I suspect Windows Defender is blocking the kernel mode trace and that for Win21H2 it returns CPU samples with no data, while on 22H2 it throws the COMException and causes the crash.

I left the machine alone for a good while (about 6 hours) after re-enabling real time protection and it started failing again.

@brianrob
Copy link
Member

@rbanks54 can you please share the Windows build numbers where things failed and where you saw them succeed? Thanks.

@rbanks54
Copy link
Author

rbanks54 commented Oct 12, 2022

@brianrob Here you go:

Failing build:

  • Win 11 Pro, 22H2,
  • Build 22623.730 (Insiders beta channel)
  • Security Intelligence Version: 1.375.1811.0

Semi-broken build (no data in stacks):

  • Win 11 Pro, 21H2
  • Build 22000.1042
  • Security intelligence version: 1.375.1710.0

Working build:

  • Win 11 Business, 21H2
  • Build 22000.978
  • Security intelligence version: 1.377.80.0

@brianrob
Copy link
Member

Thanks @rbanks54!

@evgn
Copy link

evgn commented Dec 21, 2022

Hi folks!

I've experienced the same problem on Win 11 22H2: if I run PerfView with CPU Samples checkbox set I get The requested resource is in use error. However, on the other machine on Win 10 22H2 the situation differs: the profile session is started but there is no sample events in the resulting etl-file.

Here is what I've managed to figure out about the problem.

  • Visual Studio Performance Profiler doesn't have any problems with kernel sampling and works well now. However, it had faced this problem earlier and it was fixed. Here are the tickets:
    VS_issue_1.
    VS_issue_2.

  • As it described in the issue VS_issue_2, Windows Defender brakes ETW kernel sampling. It uses Intel Threat Detection Technology which works with the hardware performance counters (more information in the issue). As a result, the counters get unavailable for kernel sampling and events get lost in the resulting ETL-snapshot (Win 10 22H2), or the kernel profile session can't start with the error (Win 11 22H2).

  • The simplest solution is either to turn of Real Time Protection or to disable TDT Feature via Power Shell like it mentioned here.

But as suggested in VS_issue_2 updating Windows Defender fixes the problem. But the solution only works for VS profiler and it doesn't for any other ETW-based profiler like Xperf, PerfView and so on which is rather strange.

So I went to figure out what VS Profiler does and found that after I perform profiling in VS, any other profiler starts working well and can collect kernel samples. But this effect magicaly disappears after some time and I need to start VS profiling again to make PerfView working again.

As it turned out, VS Profiler uses some helper service StandardCollector.Service.exe that somehow helps to other profilers to collect sampling: whilst the service is running - kernel sampling works everywhere, once it stops running - kernel sampling gets broken.

As I later figured out, the process StandardCollector.Service.exe does nothing special by it's own, whereas Windows Defender does! It detects the service by the name and resets the performance counters control register, therefore, they can be used for the kernel sampling. Actually you can launch any random process with exact name StandardCollector.Service.exe no matter what it does, and the kernel sampling will get working!

And the last thing worth to mention: there is a tool Counter Control that allows to reset the performance counter control register manually and make the kernel sampling work. The details about the tool and about about how Windows Defender works with the performance counter control register can be found here.

Further details about the performance couters and control registers can be found in the official Intel Performance Monitoring Unit programming guide.

@Xhanti
Copy link
Contributor

Xhanti commented Feb 2, 2023

In the same boat here. Turning off realtime protection worked, but it's a real pity that this is not working anymore. windows 11 is the gift that keeps on giving :( . Windows Details

Windows 11 Pro 22H2
OS build 22621.1105
Windows Feature Experience Pack 1000.22638.1000.0

@evgn
Copy link

evgn commented Feb 2, 2023

@Xhanti have you tried to reset performance counters manually? There is a Counter Control tool that I mentioned above. It allows to reset the counters not having to write your own driver. At least for me it's the best option so far.

@kzu
Copy link

kzu commented Jan 29, 2024

The counter control tool worked great for me. Now I can use ETWProfiler with BenchmarkDotNet again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants