-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow Stuck Waiting for Greylisted Event #722
Comments
Hi, I think I encountered the same problem. Do you see in logs "Workflow locked 'some-id'" from The reason why this does not always happen: sometimes EventConsumer successfully acquires a lock and an event is being processing as expected, sometimes EventConsumer failes to acquire a lock, then the event will be processed only after Also would like to notice that tuning the I will send PR soon to fix this. |
Also instead of removing an event from the greylist, we could just use |
Yeah, this lines up with what I'm seeing. In ours, we have a pretty short poll interval and are getting this somewhat frequently. Good catch! Looking forward to the PR. |
I think adding to the back of the queue again could cause a poison message, but I'm not sure why we're renewing the grey list time for events and not workflow. |
I think it could cause a poison message only when we are adding to the queue at failed event lock, but not when we are adding at failed workflow lock. The reason for renewing the grey list for events is a mystery to me. |
Hi, we have an interesting scenario where we need to coordinate events across multiple workflows and we are having some issues where events are getting greylisted and sometimes never get processed.
First, let me describe our setup:
We have 2 different types of workflows, lets say the first one is called
CoordinatorWorkflow
and the second is calledSubTaskWorkflow
. For a given task, there will be exactly 1CoordinatorWorkflow
spun up and NSubTaskWorkflow
s.When
CoordinatorWorkflow
is spun up, it knows how manySubTaskWorkflow
s there are that correspond to it and has a unique identifier for each of theSubTaskWorkflow
in the "set".As an example, let's say we have a set of 2
SubTaskWorkflow
s identified bySubtaskA
andSubtaskB
that both start around the same time andCoordinatorWorkflow
is passed a set of["SubTaskA", "SubTaskB"]
.The very first thing that
CoordinatorWorkflow
does is go into a for loop on["SubTaskA", "SubTaskB"]
and waits for an event of typeSubTaskFinished
with the key being each of the identifiersSubTaskA
andSubTaskB
.Eventually each of the
SubTaskWorkflow
s gets to a certain point and publishes an event of typeSubTaskFinished
with its identifier (SubTaskA
orSubTaskB
) as the key. At this point they wait for an event of typeCoordinationFinished
.Once
CoordinatorWorkflow
received the events from each of theSubTaskWorkflow
, it then proceeds to do some work and then fires off aCoordinationFinished
event and completes.Each
SubTaskWorkflow
then gets theCoordinationFinished
event and then proceeds to do some more work and completes.This is roughly the flow we are trying to achieve and have gotten most of the way there (Note: These aren't the actual names of the workflows, but I've tried to make them somewhat generic and domain-agnostic to simplify it).
The problem we are getting, however, is that at some point in the process, we get stuck waiting for events that never arrive. The events do indeed get published but we see a bunch of messages in the logs like the following and the workflows waiting for the events never have a chance to process them.
Any ideas as to what we may be doing wrong? Is this a known issue? If so, any work arounds?
Here are a few other observations that my team saw while troubleshooting this:
IGreyList
is registered as an in-memory singleton and the other workflow processes don't have it grey listed so they are free to pick up those events.IGreyList
that basically ignored any items prefixed withevt
.The text was updated successfully, but these errors were encountered: