Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPHosts Collector Error #1017

Closed
mf370 opened this issue Jun 26, 2017 · 7 comments
Closed

HPHosts Collector Error #1017

mf370 opened this issue Jun 26, 2017 · 7 comments
Labels
bug Indicates an unexpected problem or unintended behavior component: bots
Milestone

Comments

@mf370
Copy link

mf370 commented Jun 26, 2017

I configured the collector for the feed HPHosts with the following settings:

"hphosts-collector": {
        "description": "Generic URL Fetcher is the bot responsible to get the report from an URL.",
        "group": "Collector",
        "module": "intelmq.bots.collectors.http.collector_http",
        "name": "Generic URL Fetcher",
        "parameters": {
            "error_log_message": false,
            "feed": "HPHosts",
            "http_password": null,
            "http_proxy": null,
            "http_url": "http://hosts-file.net/download/hosts.txt",
            "http_username": null,
            "https_proxy": null,
            "provider": "HPHosts",
            "rate_limit": 3600,
            "ssl_client_certificate": null
        }
    }

I think it is well configured, however everytime I run it, it throws the following error:

2017-06-26 14:41:11,214 - hphosts-collector - INFO - Downloading report from http://hosts-file.net/download/hosts.txt
2017-06-26 14:41:11,519 - hphosts-collector - ERROR - Bot has found a problem.
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/intelmq/lib/bot.py", line 145, in start
    self.process()
  File "/usr/lib/python3.4/site-packages/intelmq/bots/collectors/http/collector_http.py", line 54, in process
    ''.format(resp.status_code))
ValueError: HTTP response status code was 999.
2017-06-26 14:41:11,520 - hphosts-collector - INFO - Bot will continue in 15 seconds.

Does anyone knows how to solve this?

@ghost
Copy link

ghost commented Jun 26, 2017

Status code 999 is interesting. Is it reproducible? Did you check the proxy settings? Can you get the page with other tools from this host (e.g. curl, wget)?

Please consider using the intelmq-users list for support, not the bug tracker. Thanks!

@ghost ghost added the support label Jun 26, 2017
@mf370
Copy link
Author

mf370 commented Jun 26, 2017

Yes, I can download the report using curl and wget. I have used the same bot configuration in different operating systems and they all return the same error.

Please consider using the intelmq-users list for support, not the bug tracker. Thanks!

Alright!

@ghost ghost added bug Indicates an unexpected problem or unintended behavior component: bots and removed support labels Jun 26, 2017
@ghost ghost added this to the v1.0 Stable Release milestone Jun 26, 2017
@ghost ghost added support and removed bug Indicates an unexpected problem or unintended behavior labels Jun 26, 2017
@ghost
Copy link

ghost commented Jun 26, 2017

Some sources imply that this means Request denied and link it with scraping, i.e. lots of requests. Beside querying less often they suggest setting a user agent.

@navtej
Copy link
Contributor

navtej commented Jun 27, 2017

"http_password": null, is the problem. The UI automatically adds it. HPHosts firewall rules get triggered because of empty auth header and it generates 999. @wagner-certat perhaps it should be fixed on UI side, if no auth is supplied dont add it to the json. Alternatively, http collector can be modified to fix it.

For a temp workaround, remove it manually by editing /opt/intelmq/etc/runtime.conf

@ghost
Copy link

ghost commented Jun 28, 2017

Either a bug in requests or at hosts-file.de:

>>> r = requests.get('http://hosts-file.net/download/hosts.txt')
>>> r.status_code
200
>>> r = requests.get('http://hosts-file.net/download/hosts.txt', auth=None)
>>> r.status_code
200
>>> r = requests.get('http://hosts-file.net/download/hosts.txt', auth=(None, None))
>>> r.status_code
999
>>> r = requests.get('http://example.com/', auth=(None, None))
>>> r.status_code
200

As workaround we can check if http_username and http_password are not null in Bot.set_request_parameters.

@navtej
Copy link
Contributor

navtej commented Jun 28, 2017

>>> r = requests.get('http://hosts-file.net/download/hosts.txt', auth=(None, None))
>> r.status_code
999

It is hosts-file.de firewall rule which gets triggered, aggressive yes but not a bug.

@ghost
Copy link

ghost commented Jul 3, 2017

When using auth=(None, None) this results in this header:

Authorization: Basic Tm9uZTpOb25l

And Tm9uZTpOb25l is None:None.
Fixed in 1e75ef0

@ghost ghost closed this as completed in 1e75ef0 Jul 3, 2017
@ghost ghost added bug Indicates an unexpected problem or unintended behavior and removed support labels Jul 3, 2017
@ghost ghost self-assigned this Jul 3, 2017
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: bots
Projects
None yet
Development

No branches or pull requests

2 participants