-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AlienVault OTX API #298
AlienVault OTX API #298
Conversation
It's not compatible to PEP8. |
Ok, will study and check that .... should be ok by now |
raw_report = utils.base64_decode(report.value("raw")) | ||
|
||
for pulse in json.loads(raw_report): | ||
comment = "author: " + pulse['author_name'] + "; name: " + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using string formatting us usually more clear. Example:
comment = ("author: {}; name: {}; description: {}"
"".format(pulse['author_name'], pulse['name'],
pulse['description']))
The resulting string can be seen immediately.
Several lines are longer than 80 chars. You can use a tuple with one element for wrapping. This will be reduced to the element itself, e.g.:
In function calls, this is not necessary as the ending scope is clearly defined. |
For the license of OTX-Python-SDK, I opened an issue there: AlienVault-OTX/OTX-Python-SDK#5 I think the collector is fine. |
OTX-Python-SDK is now licensed under Apache License 2.0: https://github.com/AlienVault-Labs/OTX-Python-SDK/blob/master/LICENSE |
@robcza is it finished? |
It should be at least mentioned in the COPYRIGHT(.md) files, that the included file is Apache licensed.
programmers@SE: How to use apache license in my project which will be LGPL |
def process(self): | ||
report = self.receive_message() | ||
if (report is None or not report.contains("raw") or | ||
len(report.value("raw").strip()) == 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last check should probably be done in Bot.receive_message()
and results in None. Was there any special message which caused problems and you fixed it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, will get rid of that check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty raw (after strip) will be rejected anyway: https://github.com/certtools/intelmq/blob/master/intelmq/lib/harmonization.py#L59
…ienvault-otx Conflicts: intelmq/bots/BOTS
|
||
def process(self): | ||
report = self.receive_message() | ||
if (report is None or not report.contains("raw"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a SyntaxError
Forgot that we have a surveillance state? I will watch at every commit you do!!!!!1!111 |
Here is a good summary with all steps needed in a development process in git (branches, rebasing etc): https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html |
Well, ok, now it seems a lot better. However I'm aware of some issues:
|
You are using the wrong format to add grouped fields:
should be:
same applies to source, classification, feed and all others. |
@sebix can you explain a little bit more? |
Have a look at the tests code: https://github.com/robcza/intelmq/blob/alienvault-otx/intelmq/tests/bots/parsers/alienvault/test_parser_otx.py#L32 |
yap, i got it! ;) |
I've fixed the issue regarding the grouped attributes and added "additional" attribute to the event generated by the parser. I see no easy way to make the classification more precise at the moment, however I will keep eye on changes introduced by AlienVault, hopefully they will come up with something to narrow it. Feels like complete right now, however feel free to comment further. And btw. @sebix those tests are pretty useful, thanks for them |
from OTXv2 import OTXv2 | ||
import json | ||
|
||
from intelmq.bots.collectors.http.lib import fetch_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch_url
is not used.
The hash type is usually stored along with the hash value as prefix. E.g |
@sebix so, malware.hash will also include the hash_type?!?!?! |
On 09/09/2015 04:46 PM, Tomás Lima wrote:
All comments made by me do represent my opinion and I don't want to |
On Sep 9, 2015, at 4:51 PM, Sebastian [email protected] wrote:
Agreed, it makes more sense to rely on that. |
Agreed. It will require to add a util to convert a hash into to hash type along with the hash value as prefix. |
I propose to replace this code:
with this:
|
Included proposed changes, adjusted the description of malware.hash and made the parser code a bit more readable. |
I just checked out your code locally. Had to rebase on master and fix several merge conflicts (please do not use merge to get changes from upstream ...) and adapted the files to current changes. Also fixed some style issues and made sure, the JSON dumps are always equal (the order of items in a dictionary is not always the same, thus we sort by keys now, also fixes Python 3 tests). Have a look here: sebix@8e2e225 |
|
Thank you @sebix, not using merge to get changes from upstream ever since you told me, so it shouldn't happen again. Key sorting makes sense, I'm totally ok with the changes you introduced. |
event.add('comment', pulse['description']) | ||
event.add('additional', additional, sanitize=True) | ||
event.add('classification.type', 'blacklist', sanitize=True) | ||
event.add('time.observation', report.value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the following example to fill the event with time.observation, feed.url e feed.name
https://github.com/certtools/intelmq/blob/master/intelmq/bots/parsers/autoshun/parser.py#L38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robcza You can cherry-pick my commit after adding my repo as remote to your local branch:
git remote add sebix [email protected]:sebix/intelmq.git
git remote update
git cherry-pick 8e2e225beda1b0ff5b417649f44396aca1e962d8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SYNchroACK there seem to be a typo in that examplereports
instead of report report
@sebix The proposal of yours also takes the "raw" field from the report making it impossible to add the "raw" record for the event itself later, failing later like this:
ERROR - Bot has found a problem.\nTraceback (most recent call last):\n File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/lib/bot.py", line 117, in start\n self.process()\n File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/bots/parsers/alienvault/parser_otx.py", line 75, in process\n event.add("raw", json.dumps(indicator), sanitize=True)\n File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/lib/message.py", line 77, in add\n raise exceptions.KeyExists(key)\nKeyExists: key u\'raw\' already exists
Should we replace the raw field instead of adding (does not feel like a proper way copying the whole raw report again and again) it or change the "constructor" to ignore the "raw" attribute from report (sorry for the cpp terminology).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the code in lib/message.py
: https://github.com/certtools/intelmq/blob/master/intelmq/lib/message.py#L181
The Event-constructor differentiates between a Report and other parameter types like dict. Report is derived from Message which is in turn derived from dict. For a report instance, it only takes feed.(name|url)
and time.observation
, not raw
. For everything else, like dicts and messages, the parameter is given to the constructor of the parent class, which is in the end dict, and all fields are copied. If this is really the case, it does mean that receive_message
/ the pipeline does not correctly unserialize the message!
So first please check the type of report e.g. self.logger.debug('report type {!r}'.format(type(report)))
, and/or some issubclass
calls.
Signed-off-by: Sebastian Wagner <[email protected]>
well, well, problem was actually in my outdated repo, tests are passing now with the cherry-picked changes from @sebix |
Yes. I hope the merge commit from today 95afec8 does not break anything.... |
I'm not aware of any problems right now. I should focus on the classification later, but merging this PR is ok. |
ok. Thank you |
import urlparse | ||
import urllib | ||
import urllib2 | ||
import simplejson as json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last question: why did you chose simplejson over json? To cite the release notes of Python 2.7:
Updated module: The json module was upgraded to version 2.0.9 of the simplejson package, which includes a C extension that makes encoding and decoding faster. (Contributed by Bob Ippolito; issue 4136.)
Also see SO:
What are the differences between json and simplejson Python modules?
, links a unicode bug in simplejson and states:
json is simplejson, added to the stdlib.
Creating this pull request to discuss several points:
Thank for your comments