-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Open URL without Certificate Validation #3878
Conversation
python/ql/src/experimental/Security/CWE-295/UrlopenWithoutValidation.ql
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/Security/CWE-295/UrlopenWithoutValidation.ql
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/Security/CWE-295/UrlopenWithoutValidation.ql
Outdated
Show resolved
Hide resolved
…dation.ql Co-authored-by: intrigus-lgtm <[email protected]>
Hi Rasmus, apologies in advance for the long follow up, but I have a question unrelated to this query and am unsure how else to contact you (since our discussion thread from earlier is now closed). This may seem a little silly, but the trace-command you showed my for python won't work due to the simple error that the '$' is not recognized. My end goal is to simply use the CLI to build a python database that includes some custom xml files I wrote, so I believe your method would work for python (init, index the xml files, trace-command, finalize). Does your PR need to be merged before this trace-command will work? Or is it something simple that I may be doing wrong with regards to the expression starting with '$'? And lastly, if including the xml is not at all possible, would you happen to know any other method to include custom data (maybe through something like a csv) and reference it within a python ql file? Thank you so much! |
Hi @dilanbhalla, I provided an answer on to the relevant issue instead :) |
Thanks for the update @RasmusWL, and apologies for the silly mistake. Also I just added/pushed the fixes you mentioned @intrigus-lgtm, so thank you too! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a few comments about how to rewrite this using API graphs, which will have to take place before it can be merged. Apart from that, I think the PR looks pretty solid. 👍
urlopen = Value::named(["urllib2.request.urlopen", "urllib.request.urlopen"]) and | ||
http_arg = urlopen.getArgumentForCall(call, 0) and | ||
http_arg.pointsTo(http_string) and | ||
http_string.getText().matches("https://%") and | ||
( | ||
not exists(Value verify | | ||
verify = call.getArgByName(["cafile", "capath", "context"]).pointsTo() | ||
) | ||
or | ||
empty = call.getArgByName(["cafile", "capath", "context"]).pointsTo() and | ||
empty = Value::none_() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current approach to modelling libraries is to use API graphs.
In this case, I think it should be fairly simple to rewrite the code to use API graphs. The calls you're interested in can be found using
API::moduleImport(["urllib","urllib2"]).getMember("request").getMember("urlopen").getACall()
For the value that is being passed to the named argument, I would either see if the named argument is absent altogether (in which case the default is used), or if it is present and equal to API::builtin("None").getAUse()
. That should mostly replicate the behaviour you have above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, API graphs are a really cool feature! I added the changes you recommended, and reformatted the code a bit (made sure to remember the autoformatting too 👍 ).
I weirdly did not catch one of my test cases when I used API::builtin("None").getAUse()
insead of comparing the value of the argument to Value::none_()
however. I kept the latter method, which worked, and commented out the former. If you have any ideas why that might be the case that would be super helpful, but otherwise the method I am currently using works anyway (I will remove the comments once I get feedback on this decision). Thanks again, and let me know if there is anything else I need to fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's possible we have a bug in API::builtin
where we fail to include things like None
(since API::builtin
is really a model of the builtins
module, but None
is kind of special). I'll see if I can make a quick fix for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#5639 should fix the problem with None
.
Incidentally, for this one it might be better to use getAnImmediateUse
rather than getAUse
. The difference here is that the latter will also take some (possibly unsound) dataflow into account, and this may lead to undesirable false negatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apoligies, I believe I made a bit of a mistake. I tried to update my repo by pulling upstream changes that have been made since I last worked on this, but instead seemed to complicate things a bit. I don't want to request review unnecessarily from codeql-java and codeql-javascript, so should I just open a fresh PR and resubmit there? |
Going to resubmit in a fresh PR. |
Query that detects use of urlopen within urllib.request and urllib2.request modules without appropriate certificate validation. A similar query already exists in python for the requests module, however there is no coverage yet for these urllib modules, which are very commonly used. By specifying a cafile, capath, or context a developer can ensure that they are verifying their certificate, however without these parameters they are vulnerable to man in the middle attacks.