You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The recrawler service has been switched off since early January, due to a lack of query results which will be opened and tracked as a separate issue for that service.
If no recrawler pods are available, requests to that service fail with connection errors -- after a considerable timeout -- as visible here in the backend-worker deployment logs:
[2021-01-27 18:28:19,290: WARNING/ForkPoolWorker-2] Recrawling failed due to "ConnectionError" exception
[2021-01-27 18:28:19,291: WARNING/ForkPoolWorker-3] Recrawling failed due to "ConnectionError" exception
[2021-01-27 18:30:30,362: WARNING/ForkPoolWorker-1] Recrawling failed due to "ConnectionError" exception
[2021-01-27 18:30:30,366: WARNING/ForkPoolWorker-3] Recrawling failed due to "ConnectionError" exception
This causes the throughput of the backend-worker instances to drop dramatically since most of the task worker time is spent attempting to make a connection.
It may be useful to consider both a short-term and longer-term fix here. Since we are not currently receiving results from the recrawler service, a patch would involve re-deploying that service to respond with empty results (effectively a no-op). Longer-term we likely want to isolate the queue workers that handle event logs, and perhaps add circuit breakers and/or adjust the connection timeouts they use.
Expected behavior
Throughput for the majority of the RecipeRadar message queues should not be adversely affected by outages in a minor service.
The text was updated successfully, but these errors were encountered:
Isn't the solution for this to add worker processes to the recrawler service? We shouldn't permit one service to get backlogged as a result of taking on the work requested for another service to perform.
Isn't the solution for this to add worker processes to the recrawler service? We shouldn't permit one service to get backlogged as a result of taking on the work requested for another service to perform.
I think that separating the worker queues is likely a better idea here. Recrawling shouldn't be in the capacity path of crawling/reindexing, for example.
Describe the bug
The
recrawler
service has been switched off since early January, due to a lack of query results which will be opened and tracked as a separate issue for that service.If no
recrawler
pods are available, requests to that service fail with connection errors -- after a considerable timeout -- as visible here in thebackend-worker
deployment logs:This causes the throughput of the
backend-worker
instances to drop dramatically since most of the task worker time is spent attempting to make a connection.It may be useful to consider both a short-term and longer-term fix here. Since we are not currently receiving results from the
recrawler
service, a patch would involve re-deploying that service to respond with empty results (effectively a no-op). Longer-term we likely want to isolate the queue workers that handle event logs, and perhaps add circuit breakers and/or adjust the connection timeouts they use.Expected behavior
Throughput for the majority of the RecipeRadar message queues should not be adversely affected by outages in a minor service.
The text was updated successfully, but these errors were encountered: