Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracted Serializer to better manage message contract types #136

Merged
merged 2 commits into from
Aug 30, 2021

Conversation

andrew-at-octopus
Copy link
Contributor

@andrew-at-octopus andrew-at-octopus commented Aug 29, 2021

Once the recent Halibut changes were made live on customer instances, two issues were noticed on a customer instance: https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1630005868012600

The two exceptions were:

Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
at System.Collections.Generic.HashSet`1.AddIfNotPresent(T value, Int32& location)
at Halibut.HalibutRuntime.CreateClient[TService](ServiceEndPoint endpoint, CancellationToken
...

and:

Type specified in JSON 'Octopus.Shared.Contracts.ScriptTicket, Octopus.Shared' was not resolved. Path 'Message.result.$type'.
Server exception:
Newtonsoft.Json.JsonSerializationException: Type specified in JSON 'Octopus.Shared.Contracts.ScriptTicket, Octopus.Shared' was not resolved. Path 'Message.result.$type'.
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.ResolveTypeName(JsonReader reader, Type& objectType, JsonContract& contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, String qualifiedTypeName)
...

The first appears to be caused by multiple tentacle clients being instantiated at the same time (different threads), which caused the non-thread-safe HashSet error. This would generally only happen at start up, as once the client types are known the collection would not change, and not throw the error. I tried to write a test to cause this exception, and while I did manage to reproduce the exception, it was very much a random thread-sync issue. I did not keep this test as I felt if would be of low value, but did use it to "test" the fixes, which was simply to protect the HashSet from concurrent use.

The second was harder to understand, but from what I could determine, it was caused by the Halibut Listener being started, and then a message being received before the client type was registered on the serializer. In general this should not be a problem, as we cannot receive a message from a tentacle instance before the client is ready to receive such messages. This is because Services take messages at any time, but Clients only receive messages in response to a request. So any message received before a client sends it should be from a client talking to the previous instance of the server, so the message should be dropped anyway (no client would receive the message). However, the issue was in the sequence:

When the listener receives a message it starts a message loop to receive all inbound messages (until terminal message, error, or timeout). The listener will then decode the message, and pass it on to the right client. But, if that message loop was started before the client type was registered (on client instantiation), then even when the client sends a message and gets the response, if the message loop has not re-started, then the type is not known, the message is not de-serialized, and the communication fails. I enhanced a test to create this situation. I found that if the second client was created before the first one sent it's message, , or if the message loop timed out (by reducing the PollingQueueWaitTimeout in App.config, and introducing a thread.sleep between the two message sends), then the test passed (prior to the fix). With the fix, the test now passes.

The first issue could be "resolved" by re-starting the server, as it is unlikely for the thread contention to re-occur on restart.
The second issue could be "resolved" by re-running the task, as the error in the first attempt would cause the message loop to restart. Then for the re-run, it would have all the types correctly registered, so the messages would be sent & received correctly.

This change fixes both issues by extracting the message serialization into a new thread-safe class, which ensures that all known service and client types are known by the type binder used to serialize/de-serialize the messages

Copy link

@MissedTheMark MissedTheMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Added a couple of comments - nothing major, feel free to do or not do them.

for (var i = 0; i < 2000; i++)
echo.SayHello("Deploy package A").Should().Be("Deploy package A" + "..."); // This must come before CreateClient<ISupportedServices> for the situation to occur

//Thread.Sleep(TimeSpan.FromSeconds(12));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed

@@ -73,10 +73,16 @@ public void OctopusCanSendMessagesToPollingTentacle()

tentaclePolling.Poll(new Uri("poll://SQ-TENTAPOLL"), new ServiceEndPoint(new Uri("https://localhost:" + octopusPort), Certificates.OctopusPublicThumbprint));

// This is here to exercise the path there the Listener's (web socket) handle loop has the protocol (with type serializer) built before the type is registered

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there ➡ where

var echo = octopus.CreateClient<IEchoService>("poll://SQ-TENTAPOLL", Certificates.TentaclePollingPublicThumbprint);
for (var i = 0; i < 2000; i++)
echo.SayHello("Deploy package A").Should().Be("Deploy package A" + "..."); // This must come before CreateClient<ISupportedServices> for the situation to occur

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a easy call either way, but I wonder if it's worth moving this into its own test?

var echo = octopus.CreateClient<ISupportedServices>("poll://SQ-TENTAPOLL", Certificates.TentaclePollingPublicThumbprint);
// This is here to exercise the path there the Listener's (web socket) handle loop has the protocol (with type serializer) built before the type is registered
var echo = octopus.CreateClient<IEchoService>("poll://SQ-TENTAPOLL", Certificates.TentaclePollingPublicThumbprint);
echo.SayHello("Deploy package A").Should().Be("Deploy package A" + "...");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above regarding possibly making another test for this.

@@ -100,11 +106,15 @@ public void OctopusCanSendMessagesToWebSocketPollingTentacle()

tentaclePolling.Poll(new Uri("poll://SQ-TENTAPOLL"), new ServiceEndPoint(new Uri($"wss://localhost:{octopusPort}/Halibut"), Certificates.SslThumbprint));

var echo = octopus.CreateClient<ISupportedServices>("poll://SQ-TENTAPOLL", Certificates.TentaclePollingPublicThumbprint);
// This is here to exercise the path there the Listener's (web socket) handle loop has the protocol (with type serializer) built before the type is registered

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there ➡ where

}

public int Listen(IPEndPoint endpoint)
{
var listener = new SecureListener(endpoint, serverCertificate, ExchangeProtocolBuilder(), HandleMessage, IsTrusted, logs, () => friendlyHtmlPageContent, () => friendlyHtmlPageHeaders, HandleUnauthorizedClientConnect);
listeners.Add(listener);
lock (listeners)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that any calls to listeners need to be locked (as well as serviceTypes, below), is it worth creating a ConcurrentHashSet class that wraps a HashSet and makes the calls thread-safe? You could just expose the methods you need here. Just spitballing here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the time frame needed for this fix though, it might be worth just adding a comment saying why the locks are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principle, and yes it shouldn't be too hard... but for now, will skip

@@ -24,8 +27,7 @@ public IServiceLease CreateService(string serviceName)

Func<object> GetService(string name)
{
Func<object> result;
if (!services.TryGetValue(name, out result))
if (!services.TryGetValue(name, out var result))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner... 👍

readonly Version currentVersion = new Version(1, 0);

public MessageExchangeStream(Stream stream, IEnumerable<Type> registeredServiceTypes, ILog log)
public MessageExchangeStream(Stream stream, IMessageSerializer serializer, ILog log)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how much cleaner this class is with the IMessageSerializer 👍 SOLID for the win! 🥳

@andrew-at-octopus andrew-at-octopus merged commit d4ba489 into master Aug 30, 2021
@andrew-at-octopus andrew-at-octopus deleted the andrew-w/extract-serializer branch August 30, 2021 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants