Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: load pipeline configurations before connecting #2034

Closed
wants to merge 1 commit into from

Conversation

monoidic
Copy link
Contributor

This appears to fix an issue where the host attribute of the pipeline is not set before being accessed while connecting pipelines, causing a bot to fail to start, as shown in this stack trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 208, in __init__
    self.__connect_pipelines()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 579, in __connect_pipelines
    self.__destination_pipeline.connect()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/pipeline.py", line 204, in connect
    if self.host.startswith("/"):
AttributeError: 'Redis' object has no attribute 'host'

@monoidic monoidic force-pushed the pipeline branch 2 times, most recently from 2368151 to 1bd8955 Compare August 11, 2021 12:23
@codecov-commenter
Copy link

Codecov Report

Merging #2034 (1bd8955) into develop (8b2d9d8) will decrease coverage by 0.04%.
The diff coverage is 80.00%.

@@             Coverage Diff             @@
##           develop    #2034      +/-   ##
===========================================
- Coverage    76.00%   75.96%   -0.05%     
===========================================
  Files          425      425              
  Lines        22845    22842       -3     
  Branches      3039     3039              
===========================================
- Hits         17364    17352      -12     
- Misses        4775     4780       +5     
- Partials       706      710       +4     
Impacted Files Coverage Δ
intelmq/lib/pipeline.py 64.39% <66.66%> (-0.32%) ⬇️
intelmq/lib/bot.py 62.10% <100.00%> (+0.04%) ⬆️
intelmq/bots/experts/modify/expert.py 92.00% <0.00%> (-2.67%) ⬇️
intelmq/tests/lib/test_pipeline.py 97.61% <0.00%> (-2.39%) ⬇️
intelmq/bots/collectors/tcp/collector.py 82.00% <0.00%> (-2.32%) ⬇️
intelmq/lib/splitreports.py 97.82% <0.00%> (-2.18%) ⬇️
intelmq/lib/harmonization.py 87.21% <0.00%> (-0.31%) ⬇️
intelmq/bin/intelmqdump.py 18.36% <0.00%> (+0.06%) ⬆️
... and 3 more

@ghost ghost added this to the 3.0.1 milestone Aug 11, 2021
@ghost ghost self-assigned this Aug 13, 2021
@ghost ghost self-requested a review August 13, 2021 09:06
@ghost
Copy link

ghost commented Aug 13, 2021

Under which circumstances did you encounter this bug?

@monoidic
Copy link
Contributor Author

I'm not fully certain at the moment, actually.
It was while running IntelMQ 3.0 with the Redis backend, and seemingly only manifested itself in output bots. Manually setting the Host parameter in the manager or setting it to a different value (such as swapping between 127.0.0.1 and 127.0.0.2) was a temporary workaround, which lasted for some indefinite period of time.

@ghost
Copy link

ghost commented Aug 16, 2021

PipelineFactory.create calls set_queues, which then calls load_configurations.

A comment in intelmq.tests.lib.test_pipeline_TestRedis class reminded me of #1875

@ghost
Copy link

ghost commented Aug 16, 2021

Still debugging...

In 56df092 I rewrote the redis test to better match the behavior in Bot, using the same methods and arguments. If I set a different host (e.g. with INTELMQ_PIPELINE_HOST the test times out), so that seems to work in principle.

@monoidic
Copy link
Contributor Author

In case you are unable to reproduce it, I can test out whatever you want.
After applying this patch:

diff --git a/intelmq/lib/pipeline.py b/intelmq/lib/pipeline.py
index 6c276b0f..e0933856 100644
--- a/intelmq/lib/pipeline.py
+++ b/intelmq/lib/pipeline.py
@@ -57,6 +57,8 @@ def create(logger, broker=None, direction=None, queues=None, pipeline_args=None,
                 broker = broker.title()
             else:
                 broker = "Redis"
+
+        logger.info(f'{queues=} {direction=}')
         pipe = getattr(intelmq.lib.pipeline, broker)(logger=logger, pipeline_args=pipeline_args, load_balance=load_balance, is_multithreaded=is_multithreaded)
         if queues and not direction:
             raise ValueError("Parameter 'direction' must be given when using "

I got the following lines in a bot's log file, when attempting to start it:

2021-08-16 16:47:37,029 - elasticsearch-output - INFO - ElasticsearchOutputBot initialized with id elasticsearch-output and intelmq 3.0.0 and python 3.8.10 (default, Jun  2 2021, 10:49:15) as process 551040.
2021-08-16 16:47:37,029 - elasticsearch-output - INFO - Bot is starting.
2021-08-16 16:47:37,586 - elasticsearch-output - INFO - queues='elasticsearch-output-queue' direction='source'
2021-08-16 16:47:37,586 - elasticsearch-output - INFO - queues={} direction='destination'
2021-08-16 16:47:37,586 - elasticsearch-output - ERROR - Bot initialization failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 208, in __init__
    self.__connect_pipelines()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 581, in __connect_pipelines
    self.__destination_pipeline.connect()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/pipeline.py", line 208, in connect
    if self.host.startswith("/"):
AttributeError: 'Redis' object has no attribute 'host'
2021-08-16 16:47:37,587 - elasticsearch-output - INFO - Bot stopped.

So it appears to happen with the destination queue specifically, and retrying it a couple of times gives identical log lines.

@ghost
Copy link

ghost commented Aug 16, 2021

I just got it reproduced locally, with an output bot, as you said.

@ghost
Copy link

ghost commented Aug 16, 2021

set_queues and load_configurations is correctly called for the source pipeline. I think the primary problem is, that the code also loads the destination pipeline which doesn't make sense and obviously fails.

In __connect_pipelines:

if self.destination_queues is not None:

the check is is not None. The empty default value is:
destination_queues: Optional[dict] = None

also None. However, something between sets the variable to an empty dict, which is not None and then the bot tries to connect to nothing.
Haven't found that code yet, but maybe an empty dict is a better default value anyway, also because it is assumed in the code, e.g. here:
if self.destination_queues and '_on_error' in self.destination_queues:

@ghost ghost added bug Indicates an unexpected problem or unintended behavior component: core labels Aug 16, 2021
@ghost
Copy link

ghost commented Aug 16, 2021

Haven't found that code yet,

Ah, that's the loading of the runtime configuration. That makes sense. If the variable is not defined, the error does not occur, but if the configuration has for example:

    destination_queues: {}

Then it occurs.

@monoidic
Copy link
Contributor Author

Sounds like it's related to some of the changes I've made in the manager then.

@monoidic
Copy link
Contributor Author

monoidic commented Aug 17, 2021

And, indeed, I have destination_queues: {} in the runtime.yaml on those bots, due to some of my new code in intelmq-manager setting the field to an empty object by default in intelmq-manager so its existence does not have to be checked for when adding/checking edges.
Now the question seems to be what the solution should be.

  • intelmq-manager should clear empty destination_queues from the configuration sent to the backend,
  • destination_queues: {} should be handled as a valid option indicating no output queues,
  • both?

@ghost
Copy link

ghost commented Aug 17, 2021

I favour the second option, which is also simple fixable. The core should handle any valid configuration, independent of the tool which created the config. Such a config could also be written by a human.

@schacht-certat What's your opinion?

@ghost
Copy link

ghost commented Aug 18, 2021

I favour the second option, which is also simple fixable. The core should handle any valid configuration, independent of the tool which created the config. Such a config could also be written by a human.

@schacht-certat What's your opinion?

Yes, handling this in intelmq itself is the better option, destination_queues: {} should be handled the same as if it was None (or we simply set it to an empty dict by default and remove the checks for None)

ghost pushed a commit that referenced this pull request Aug 18, 2021
the manager (or any user in the file directly) sets destination_queues
to an empty dict. the check `is not None` did not catch empty dicts,
initializing the destination pipeline even when no queues were defined.
The check is now more lax and the code handles empty dicts normally,
making them also the default value instead of None.

fixes #2034
@ghost
Copy link

ghost commented Aug 18, 2021

based on what we found out here in this thread, i created #2051 with a smaller change

@monoidic
Copy link
Contributor Author

Fixed properly by #2051

@monoidic monoidic closed this Aug 19, 2021
ghost pushed a commit that referenced this pull request Aug 19, 2021
the manager (or any user in the file directly) sets destination_queues
to an empty dict. the check `is not None` did not catch empty dicts,
initializing the destination pipeline even when no queues were defined.
The check is now more lax and the code handles empty dicts normally,
making them also the default value instead of None.

fixes #2034
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants