Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WcfEnabledSubProcess::OnBrowserDestroyed crashes render process when channelFactory null #2839

Closed
jcolag opened this issue Jul 17, 2019 · 16 comments
Milestone

Comments

@jcolag
Copy link

jcolag commented Jul 17, 2019

  • What version of the product are you using?

    • 73.1.130, installed via the Visual Studio solution package manager
  • What architecture x86 or x64?

    • x86
  • On what operating system?

    • Win10
  • Are you using WinForms, WPF or OffScreen?

    • WPF
  • What steps will reproduce the problem?

    • Note: We appear to only be able to reproduce this in our pre-existing application, where we're replacing WPF's WebBrowser control with CefSharp, however...
    • Navigate the browser to a page using JavaScript's window.open() to create a pop-up window.
    • Close the pop-up window.
    • The main browser is no longer functioning. The browser console reports "DevTools was disconnected from the page," and can only be used to refresh the browser to restart the process.
  • What is the expected output? What do you see instead?

    • The browser should always be available to respond to user interaction.
  • Please provide any additional information below.

    • A colleague noted that one thread is stalled on this line:
      var task = queue.Take(cancellationTokenSource.Token);
    • Since the Take() method will only ever return on success, it's possible that it might be more useful to call TryTake(), here, whether or not that solves our problem.
  • Does the cef log provide any relevant information? (By default there should be a debug.log file in your bin directory)

    • No.
  • Any other background information that's relevant? Are you doing something out of the ordinary? 3rd party controls?

    • This is an older, proprietary application for a client, which uses many JavaScript-to-C# (and vice versa) calls to connect the web page with hardware, so while there may be any number of factors, unfortunately disclosing them could be problematic.
    • As far as we can tell, however, the path of opening a pop-up and closing it doesn't need to pass through any of those libraries or features to trigger the problem.
    • It is possibly noteworthy that this occurs whether using the default Popup Lifespan Handler, the empty version used in the sample application, or the experimental code commented out in the sample. The only difference between the three is that the experimental code exposes a minimal exception:
at System.Threading.CancellationToken.ThrowOperationCanceledException()

...at this point in the call stack:

 	mscorlib.dll!System.Threading.CancellationToken.ThrowOperationCanceledException() Line 482	C#
 	mscorlib.dll!System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(int millisecondsTimeout, uint startTime, System.Threading.CancellationToken cancellationToken) Line 459	C#
 	mscorlib.dll!System.Threading.SemaphoreSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) Line 439	C#
 	System.dll!System.Collections.Concurrent.BlockingCollection<System.Threading.Tasks.Task<CefSharp.Internals.MethodInvocationResult>>.TryTakeWithNoTimeValidation(out System.Threading.Tasks.Task<CefSharp.Internals.MethodInvocationResult> item, int millisecondsTimeout, System.Threading.CancellationToken cancellationToken, System.Threading.CancellationTokenSource combinedTokenSource) Line 712	C#
 	System.dll!System.Collections.Concurrent.BlockingCollection<System.Threading.Tasks.Task<CefSharp.Internals.MethodInvocationResult>>.TryTake(out System.Threading.Tasks.Task<CefSharp.Internals.MethodInvocationResult> item, int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) Line 667	C#
 	System.dll!System.Collections.Concurrent.BlockingCollection<System.__Canon>.Take(System.Threading.CancellationToken cancellationToken) Line 578	C#
 	CefSharp.dll!CefSharp.Internals.MethodRunnerQueue.ConsumeTasks() Line 89	C#
 	mscorlib.dll!System.Threading.Tasks.Task.InnerInvoke() Line 2884	C#
 	mscorlib.dll!System.Threading.Tasks.Task.Execute() Line 2498	C#
 	mscorlib.dll!System.Threading.Tasks.Task.ExecutionContextCallback(object obj) Line 2861	C#
 	mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 954	C#
 	mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 902	C#
 	mscorlib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot) Line 2827	C#
 	mscorlib.dll!System.Threading.Tasks.Task.ExecuteEntry(bool bPreventDoubleExecution) Line 2756	C#
 	mscorlib.dll!System.Threading.Tasks.ThreadPoolTaskScheduler.LongRunningThreadWork(object obj) Line 49	C#
 	mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state) Line 74	C#
 	mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 954	C#
 	mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 902	C#
 	mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 891	C#
>	mscorlib.dll!System.Threading.ThreadHelper.ThreadStart(object obj) Line 93	C#

The exception occurs at least twice, seemingly identically. Note the ConsumeTasks() call in the trace.
- Impressively, this is the only piece of the application that we weren't able to get to work.

@amaitland
Copy link
Member

  • The main browser is no longer functioning. The browser console reports "DevTools was disconnected from the page," and can only be used to refresh the browser to restart the process.

MethodRunnerQueue is unlikely to have any influence on what you are seeing. It sounds like your render process has crashed. Browsers and Popups all share the same render process with the current process model. Refreshing DevTools should spawn the new render process. Check to see if the process has crashed. You can look in Task Manager or implement http://cefsharp.github.io/api/73.1.x/html/M_CefSharp_IRequestHandler_OnRenderProcessTerminated.htm

  • It is possibly noteworthy that this occurs whether using the default Popup Lifespan Handler

What is the default popup lifespan handler exactly? The default value for LifeSpanHandler is null.

System.Threading.CancellationToken.ThrowOperationCanceledException()

Throwing a First Chance exception is expected, and should be caught in the catch that wraps Take. Do you have First Chance exceptions enabled in Visual Studio? You shouldn't see this exception otherwise.

  • Since the Take() method will only ever return on success, it's possible that it might be more useful to call TryTake(), here, whether or not that solves our problem.

Blocking until there is something in the queue to process is how the class is designed. TryTake is non blocking and would then need to loop unnecessarily. Again I find it very unlikely that MethodRunnerQueue has anything to do with what you are seeing.

@jcolag
Copy link
Author

jcolag commented Jul 18, 2019

You can look in Task Manager or implement http://cefsharp.github.io/api/73.1.x/html/M_CefSharp_IRequestHandler_OnRenderProcessTerminated.htm

Thanks! I do now see the crash. Any thoughts on how to run that down, since this site has worked on Chrome basically since it was released? Or is that another project to check in with?

What is the default popup lifespan handler exactly? The default value for LifeSpanHandler is null.

I'm referring to whatever behavior is there when it's null. Obviously just a matter of semantics, but that's clearly from some default handler (though probably not running on .NET), since code must be running to open the pop-ups.

Throwing a First Chance exception is expected, and should be caught in the catch that wraps Take. Do you have First Chance exceptions enabled in Visual Studio? You shouldn't see this exception otherwise.

Exactly that, since we're grasping at straws to find the source of this one last problem.

@amaitland
Copy link
Member

@jcolag
Copy link
Author

jcolag commented Jul 19, 2019

I started looking at Crashpad, but working from the minimal example the CEF project provided produced these log entries:

[0719/084015.097:ERROR:crashpad_client_win.cc(505)] CreateProcess: The parameter is incorrect. (0x57)
[0719/084015.107:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: browser
[0719/084015.225:ERROR:registration_protocol_win.cc(56)] CreateFile: The system cannot find the file specified. (0x2)
[0719/084015.227:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: gpu-process
[0719/084015.816:ERROR:registration_protocol_win.cc(56)] CreateFile: The system cannot find the file specified. (0x2)
[0719/084015.820:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: renderer
[0719/084030.202:ERROR:registration_protocol_win.cc(56)] CreateFile: The system cannot find the file specified. (0x2)
[0719/084030.204:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: gpu-process

And obviously don't get any evidence of the crash. And here's the crash_reporter.cfg, in case I'm doing something stupidly obvious...

[Config]
# Product information.
ProductName=cefclient
ProductVersion=1.0.0

# Required to enable crash dump upload.
ExternalHandler=CefSharp.BrowserSubprocess.exe
ServerURL=http://localhost:18080

# Disable rate limiting so that all crashes are uploaded.
RateLimitEnabled=false
MaxUploadsPerDay=0

The content comes from an example from the CEF example, plus the ExternalHandler line recommended by your link. And yes, I have crash_server.py running on port 18080; double-checked that one a lot.

@amaitland
Copy link
Member

@jcolag
Copy link
Author

jcolag commented Jul 21, 2019

No luck, unfortunately. Updating the CEF version installed by Nuget (and ignoring the commented lines), I have:

[Config]
ProductName=CefSharp
ProductVersion=73.1.13
AppName=CefSharp
ExternalHandler=CefSharp.BrowserSubprocess.exe

But still get pretty much the same log:

[0721/141118.798:ERROR:crashpad_client_win.cc(505)] CreateProcess: The parameter is incorrect. (0x57)
[0721/141118.807:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: browser
[0721/141118.938:ERROR:registration_protocol_win.cc(56)] CreateFile: The system cannot find the file specified. (0x2)
[0721/141118.940:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: gpu-process
[0721/141119.601:ERROR:registration_protocol_win.cc(56)] CreateFile: The system cannot find the file specified. (0x2)
[0721/141119.608:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: renderer
[0721/141133.908:ERROR:registration_protocol_win.cc(56)] CreateFile: The system cannot find the file specified. (0x2)
[0721/141133.910:INFO:crash_reporting.cc(219)] Crash reporting enabled for process: gpu-process

And nothing that resembles a dump or crash report after the crash, even though I see the reporting enabled for the renderer. The newest files in the binary folder are the configuration file, the log, and the executable. "The system cannot find the file specified" lines seem suspicious; it's not a permission problem (a quick test running Visual Studio as administrator), so do I maybe need to create a folder in advance to receive the report? I don't see anything about that in the documentation, but that sounds like the sort of issue I used to create in C/C++ file I/O...

@amaitland
Copy link
Member

You can try attaching the debugger to the render process, use --renderer-startup-dialog command line arg if you need to attach before any processing occurs, see https://github.com/cefsharp/CefSharp/blob/cefsharp/75/CefSharp.Example/CefExample.cs#L66 for an example.

Make sure you have native code debugging enabled and you've got the pdb (symbols) placed next to libcef.dll. You can download the pdb from http://opensource.spotify.com/cefbuilds/cef_binary_75.1.4%2Bg4210896%2Bchromium-75.0.3770.100_windows32_release_symbols.tar.bz2

@jcolag
Copy link
Author

jcolag commented Jul 23, 2019

Progress! Of a sort, at least. It took a while to get the hang of the settings object, but I now see a null reference exception ("Object reference not set to an instance of an object") in WcfEnabledSubProcess::OnBrowserDestroyed() in ...\CefSharp.BrowserSubprocess.Core\WcfEnabledSubProcess.cpp. It appears to trip over the check of channelFactory->State, since channelFactory is nullptr, then crashes in the exception handler, when it tries to call channelFactory->Abort().

Here's the call stack:

>	CefSharp.BrowserSubprocess.Core.dll!CefSharp::BrowserSubprocess::WcfEnabledSubProcess::OnBrowserDestroyed(CefSharp::CefBrowserWrapper^ browser) Line 70	C++
 	CefSharp.BrowserSubprocess.Core.dll!CefSharp::CefAppUnmanagedWrapper::OnBrowserDestroyed(scoped_refptr<CefBrowser>* browser) Line 63	C++
 	[Native to Managed Transition]	
 	[Managed to Native Transition]	
 	CefSharp.BrowserSubprocess.Core.dll!CefSharp::BrowserSubprocess::SubProcess::Run() Line 56	C++
 	CefSharp.BrowserSubprocess.exe!CefSharp.BrowserSubprocess.Program.Main(string[] args) Line 52	C#

I don't think I'm doing anything special with respect to closing the popups. I've removed the popup-lifespan handler to confirm it's nothing in that experimental code.

@amaitland
Copy link
Member

ChannelFactory should never be null, you can step through OnBrowserCreated to see what's happening. We can add some null checking to avoid the problem, without an example that reproduces the issue there's nothing much else I can do.

It's recommended that anyone creating a new application use the Async version as it's under active development.

As per https://github.com/cefsharp/CefSharp/wiki/General-Usage#sync-javascript-binding-jsb

I need to make it clearer that using the sync JavaScript Binding in new development should be avoided, it remains as a legacy for those upgrading.

@jcolag
Copy link
Author

jcolag commented Jul 24, 2019

I wasn't actually able to get the synchronous binding to work at all, for some reason, so moved to the asynchronous binding early on. For what it's worth, documentation-wise, the only point of confusion I had was that the JavaScript would need to asynchronously bind the object and asynchronously make the calls into C# code.

I'm definitely aware of how frustrating this has to be on your side, too, and very much appreciate your time on this. I wasted a few days (before opening this issue) trying to remove things from our application and add them to a clean example to find the problem, but nothing flipped the equation. And, unfortunately, this application is too central to the business and too big to recommend rewriting it, after only barely selling them on replacing IE to sidestep its ongoing decay.

I don't think it's relevant, but (on the way to looking at OnBrowserCreated) I also just got Visual Studio to request f:\dd\ndp\clr\src\BCL\system\runtime\remoting\realproxy.cs and C:\projects\cefsharp\CefSharp.BrowserSubprocess\Program.cs, which I thought was out of place until I hit a "The pipe endpoint 'net.pipe://localhost/CefSharpSubProcessProxy/17760/1' could not be found on your local machine" exception while stepping through the creation code. I believe the line is:

browser->ChannelFactory = channelFactory;

But I haven't figured out which specific commit I should be using (currently using feb1bce from May 3rd, which seems like it should at least be close), so Visual Studio is acting a little bit fussy. It definitely seems to be tripping up in that last try block, though, and the Open() call makes a lot more intuitive sense. Regardless, an exception before it's assigned would certainly explain why ChannelFactory is empty.

try
{
     clientChannel->Open();
     browser->ChannelFactory = channelFactory;
     browser->BrowserProcess = browserProcess;
}
catch (Exception^)
{
}

You probably already see this from context, but process ID #17760 is the application itself.

Now, it's been a good fifteen years since I've worked with named pipes and never on Windows, but would there happen to be a configuration option to set the pipe's location so it can be inspected, maybe? Or is that just a red herring because you're using something higher-level to coordinate? It definitely sounds like the problem is back in the main process failing to "listen," at least, and that might help explain why nobody else is having this problem.

@jcolag
Copy link
Author

jcolag commented Jul 24, 2019

I forgot to include the stack trace for the communication error, though I assume there aren't many paths there:

 	System.ServiceModel.dll!System.ServiceModel.Channels.PipeConnectionInitiator.GetPipeName(System.Uri uri, System.ServiceModel.Channels.IPipeTransportFactorySettings transportFactorySettings)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.NamedPipeConnectionPoolRegistry.NamedPipeConnectionPool.GetPoolKey(System.ServiceModel.EndpointAddress address, System.Uri via)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.CommunicationPool<string, System.ServiceModel.Channels.IConnection>.TakeConnection(System.ServiceModel.EndpointAddress address, System.Uri via, System.TimeSpan timeout, out string key)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(System.TimeSpan timeout)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(System.TimeSpan timeout)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.CommunicationObject.Open(System.TimeSpan timeout)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.ServiceChannel.OnOpen(System.TimeSpan timeout)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.CommunicationObject.Open(System.TimeSpan timeout)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.CommunicationObject.Open()	Unknown
 	[Native to Managed Transition]	
 	[Managed to Native Transition]	
 	System.ServiceModel.dll!System.ServiceModel.Channels.ServiceChannelProxy.ExecuteMessage(object target, System.Runtime.Remoting.Messaging.IMethodCallMessage methodCall)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.ServiceChannelProxy.InvokeChannel(System.Runtime.Remoting.Messaging.IMethodCallMessage methodCall)	Unknown
 	System.ServiceModel.dll!System.ServiceModel.Channels.ServiceChannelProxy.Invoke(System.Runtime.Remoting.Messaging.IMessage message)	Unknown
 	mscorlib.dll!System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(ref System.Runtime.Remoting.Proxies.MessageData msgData, int type) Line 823	C#
>	CefSharp.BrowserSubprocess.Core.dll!CefSharp::BrowserSubprocess::WcfEnabledSubProcess::OnBrowserCreated(CefSharp::CefBrowserWrapper^ browser) Line 49	C++
 	CefSharp.BrowserSubprocess.Core.dll!CefSharp::CefAppUnmanagedWrapper::OnBrowserCreated(scoped_refptr<CefBrowser>* browser) Line 50	C++
 	[Native to Managed Transition]	
 	[Managed to Native Transition]	
 	CefSharp.BrowserSubprocess.Core.dll!CefSharp::BrowserSubprocess::SubProcess::Run() Line 56	C++
 	CefSharp.BrowserSubprocess.exe!CefSharp.BrowserSubprocess.Program.Main(string[] args) Line 52	C#

And the top-level exception (with a Source of System.ServiceModel) is:

There was no endpoint listening at net.pipe://localhost/CefSharpSubProcessProxy/19972/1 that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.

With

The pipe endpoint 'net.pipe://localhost/CefSharpSubProcessProxy/19972/1' could not be found on your local machine.

...being the inner exception, of course.

@amaitland
Copy link
Member

If you are not using sync JavaScript Binding then you don't need WCF, it's disabled by default so somewhere you are enabling it. Set it back to disabled and this all goes away.

http://cefsharp.github.io/api/73.1.x/html/P_CefSharp_CefSharpSettings_WcfEnabled.htm
It's possible you are calling http://cefsharp.github.io/api/73.1.x/html/M_CefSharp_WebBrowserExtensions_RegisterJsObject.htm which would also enable it.

I'll go through your lengthy response later when I have more time.

@amaitland
Copy link
Member

I'm definitely aware of how frustrating this has to be on your side, too, and very much appreciate your time on this

I appreciate that you've put in effort to track this down. Not particularly frustrating for me, people put in much less effort and provide fewer details than you have 👍 There's only so much I can do without an example that reproduces the problem.

But I haven't figured out which specific commit I should be using (currently using feb1bce from May 3rd, which seems like it should at least be close), so Visual Studio is acting a little bit fussy

Releases are tagged, you can get the exact commit that corresponds to a release from https://github.com/cefsharp/CefSharp/tags

Or is that just a red herring because you're using something higher-level to coordinate? It definitely sounds like the problem is back in the main process failing to "listen," at least, and that might help explain why nobody else is having this problem.

The pipe likely failed to open, which in the normal course of events should never happen. The exceptions for this are currently swallowed as they cause false positives when a browser is opened then disposed of in rapid succession, which is an expected use case. The pipe not opening is an extremely rare event and really should never happen. I'm aware of one case that WCF handles particularly poorly #915 perhaps you are experiencing something similar.

Adding some additional null checking and hopefully some logging (have to make sure that we log when there's an actual error, not just the browser being disposed of soon after it's created).

@amaitland amaitland changed the title Main Browser Hangs on Closing Popup WcfEnabledSubProcess::OnBrowserDestroyed crashes render process when channelFactory null Jul 26, 2019
@amaitland amaitland added this to the 75.0.0 milestone Jul 26, 2019
@amaitland
Copy link
Member

For what it's worth, documentation-wise, the only point of confusion I had was that the JavaScript would need to asynchronously bind the object and asynchronously make the calls into C# code.

Anyone with a GitHub account can edit the Wiki, if you'd like to take a stab at making the documentation clearer be my guest 😄 Just let me know so I can review the changes.

@jcolag
Copy link
Author

jcolag commented Jul 26, 2019

Ah, I've been using GitHub for years, but only as a dump-site for code and haven't looked at editing things. Once I have things settled, I'll make some time to go through the wiki. We're going through a release (unrelated to this issue), so if I zone out for a few days, that's why...

Meantime, for whatever reason (old code from trying to get the synchronous JavaScript to work? Doesn't seem like something I'd play with), it looks like this was all down to the WcfEnabled setting, so mystery solved. Thanks for your help!

@jcolag jcolag closed this as completed Jul 26, 2019
@amaitland
Copy link
Member

Reopening as we'll add an additional set of null checks at a minimum.

@amaitland amaitland reopened this Jul 26, 2019
amaitland added a commit that referenced this issue Jul 27, 2019
… if WCF was null

It's likely the WCF host didn't start and some break points should be added in ManagedCefBrowserAdapter::InitializeBrowserProcessServiceHost
to catch the actual exception. We're not logging as it causes too many false positives as it's expected there will be errors
when browser is created then rapidly Disposed.

Resolves #2839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants