Threading review of SqsNotificationListener and Throttled Message Processing Strategy #422

slang25 · 2018-11-07T09:25:06Z

This has come up in recent discussions, so here is a braindump so that we don't lose this.

Looking at IMessageProcessingStrategy, it has the following members:

int MaxWorkers { get; }
int AvailableWorkers { get; }
void StartWorker(Func<Task> action);
Task WaitForAvailableWorkers();

Imagine we have the default Throttled implementation, with 10 max workers, and 6 workers in-flight (invoking handlers).

When SqsNotificationListener wants to get messages it goes through the following process:

Is AvailableWorkers > 0? (if not await WaitForAvailableWorkers)
Here, yes, we have 4 available. We can continue.
Long poll SQS for a maximum of 4 messages. We will either receive 4 messages, or wait a maximum of 20 second to receive up to 4 messages.
Foreach message, call StartWorker with the handler (wrapped in a few bits and bobs).
Go to step 1 immediately (while the handlers are inflight).
(Step 1 again) Is AvailableWorkers > 0? No, now await WaitForAvailableWorkers

In isolation, this process is alright, and seems sensible. Where it gets tricky is when you share the IMessageProcessingStrategy and have multiple SqsNotificationListener going through this same workflow.
However, it is quite likely you will want to share an instance of IMessageProcessingStrategy, as you might want a concurrency level across multiple queues, or for your whole application.

There's a few problems with the IMessageProcessingStrategy interface as it stands when you have multiple listeners interacting with it:

When all workers are busy, you will likely have multiple listeners all waiting at the WaitForAvailableWorkers step.
They will all be released simultaneously and race.
They will all read the AvailableWorkers value, and likely see the same number (lets say 1).
Once they receive the message, they will all try to call StartWorker, the first succeeding.
In an old version of JustSaying (early 4), this race condition would cause Throttled to lose count, which wouldn't end well.
In recent version, subsequent attempts to call StartWorker will block until there are available workers. (Blocking threadpool threads is bad)
There is a race condition where listeners are release from awaiting WaitForAvailableWorkers, then they check the AvailableWorkers count to know how many messages to request from SQS.
When reading AvailableWorkers, this value could have changed and now might be 0, in which case an exception is thrown, logged and the loop continues.

The Throttled implementation uses SemaphoreSlim internally, this lets us guarantee the count is maintained in a thread safe way, and we can wait on it both synchronously and asynchronously.
StartWorker uses a Task constructor to wrap the async work, then started. This will the task on the threadpool. This behaves the same as Task.Run, but just less familiar.

My view is that the StartWorker implementation behaves as we desire (with exception of the blocking scenario), it may just allocate more than it needs to.
The throttling logic is a bit broken in it's design and could do with some drastic rethinking, and simplifying.

The text was updated successfully, but these errors were encountered:

shaynevanasperen · 2018-11-07T11:54:21Z

Another concern is that if you have scaled out your application to run on multiple machines, each instance runs in it's own "world" and doesn't know about the other instances, so you could have one instance being too greedy and fetching all the messages from a queue, leaving the other instances "starved".

slang25 · 2018-11-07T15:30:23Z

Yeah, we pretty much have to trust that SQS distributes the messages fairly. A very similar issue is being able to throttle across multiple machines to prevent some downstream service from getting too much load. Consul has the concept of a distributed semaphore that could be used for that, and you could build your own IMessageProcessingStrategy on top of that (there are still problems to solve with that though).

AnthonySteele · 2018-11-07T15:48:43Z

IMHO, it is not worth trying to achieve accurate distributed consensus on which machine has the most resources to process the next message - in the time taken, you could probably have processed the messages instead.

What we have to keep happening is that statistically, over time, they distribute the load fairly evenly. This is especially important under high load.

slang25 · 2018-11-13T14:10:15Z

I'd like to address the issue where we block under throttling and contention, this will only happen when you share an instance of IMessageProcessingStrategy across multiple subscriptions (I've seen this used a lot).

See this code:
https://github.com/justeat/JustSaying/blob/1aab3228bea66035cda5b22ac3acba8a8dbd1bc7/JustSaying/Messaging/MessageProcessingStrategies/Throttled.cs#L21-L38

We basically want line 24 to be awaited, this could be done by making this method async and return a Task, which is a breaking change but retains the current behaviour.

Option 1

public async Task StartWorker(Func<Task> action) // breaking change
{
    var messageProcessingTask = new Task<Task>(() => ReleaseOnCompleted(action));
    await _semaphore.WaitAsync();
    messageProcessingTask.Start();
}

private async Task ReleaseOnCompleted(Func<Task> action)
{
    try
    {
        await action();
    }
    finally
    {
        _semaphore.Release();
    }
}

One easy non-breaking alternative would be to move this wait into into the started task, like this:

Options 2

public void StartWorker(Func<Task> action)
{
    var messageProcessingTask = new Task<Task>(() => ReleaseOnCompleted(action));
    messageProcessingTask.Start(); // Could now be Task.Run
}

private async Task ReleaseOnCompleted(Func<Task> action)
{
    await _semaphore.WaitAsync();
    try
    {
        await action();
    }
    finally
    {
        _semaphore.Release();
    }
}

However this would now progress the listen loop to the WaitForAvailableWorkers inside step 1 of the original post. To retain the current behaviour, you'd want the await inside ReleaseOnCompleted to always take priority over the await in step 1, which would require a ManualResetEventSlim or maybe a larger rework of the internals of this class, so maybe not ideal, let's call this option 3.

Going back to option 1, we could add a new interface to not break the old one, but what would we call it? Maintaining 2 versions get more messy. I also feel like we would be building on top of a shaky foundation, because the interface is fundamentally flawed. It also seems wrong when you look at the signature, why should StartWorker need to be asynchronous, here it's just a hack to fix the bad design.

I'd welcome some thoughts on this, and other suggestions.

Update I've just noticed that the StartWorker signature is in the process of being changed as part of #403, so maybe this is the perfect opportunity to make this change in the coming release.

shaynevanasperen · 2018-11-20T11:42:29Z

@stuart-lang We've already decided that the next version of JustSaying will have some breaking changes, so this should be fine.

slang25 · 2018-11-25T11:02:54Z

The blocking in IMessageProcessingStrategy has been removed, so closing

slang25 mentioned this issue Nov 19, 2018

Remove blocking in IMessageProcessingStrategy #440

Merged

slang25 closed this as completed Nov 25, 2018

martincostello added this to the v7.0.0 milestone Nov 25, 2018

slang25 mentioned this issue Jun 21, 2019

Improve throttled processing strategy #546

Merged

maisiesadler mentioned this issue Oct 31, 2019

[WIP] Allow more control around the SqsNotificationListener loop #592

Closed

4 tasks

slang25 mentioned this issue Apr 21, 2023

fix: Ensure ActiveMessageCount is updated in a thread safe manner awslabs/aws-dotnet-messaging#35

Merged

ashovlin mentioned this issue Apr 25, 2023

Review Poller Concurrency Design awslabs/aws-dotnet-messaging#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threading review of SqsNotificationListener and Throttled Message Processing Strategy #422

Threading review of SqsNotificationListener and Throttled Message Processing Strategy #422

slang25 commented Nov 7, 2018 •

edited

Loading

shaynevanasperen commented Nov 7, 2018

slang25 commented Nov 7, 2018

AnthonySteele commented Nov 7, 2018 •

edited

Loading

slang25 commented Nov 13, 2018 •

edited

Loading

shaynevanasperen commented Nov 20, 2018

slang25 commented Nov 25, 2018

Threading review of SqsNotificationListener and Throttled Message Processing Strategy #422

Threading review of SqsNotificationListener and Throttled Message Processing Strategy #422

Comments

slang25 commented Nov 7, 2018 • edited Loading

shaynevanasperen commented Nov 7, 2018

slang25 commented Nov 7, 2018

AnthonySteele commented Nov 7, 2018 • edited Loading

slang25 commented Nov 13, 2018 • edited Loading

Option 1

Options 2

shaynevanasperen commented Nov 20, 2018

slang25 commented Nov 25, 2018

slang25 commented Nov 7, 2018 •

edited

Loading

AnthonySteele commented Nov 7, 2018 •

edited

Loading

slang25 commented Nov 13, 2018 •

edited

Loading