Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Open URL without Certificate Validation #3878

Closed
wants to merge 9 commits into from

Conversation

dilanbhalla
Copy link
Contributor

Query that detects use of urlopen within urllib.request and urllib2.request modules without appropriate certificate validation. A similar query already exists in python for the requests module, however there is no coverage yet for these urllib modules, which are very commonly used. By specifying a cafile, capath, or context a developer can ensure that they are verifying their certificate, however without these parameters they are vulnerable to man in the middle attacks.

@dilanbhalla dilanbhalla requested a review from a team as a code owner July 2, 2020 17:14
@RasmusWL RasmusWL added the Python label Jul 3, 2020
@dilanbhalla
Copy link
Contributor Author

Hi Rasmus, apologies in advance for the long follow up, but I have a question unrelated to this query and am unsure how else to contact you (since our discussion thread from earlier is now closed). This may seem a little silly, but the trace-command you showed my for python won't work due to the simple error that the '$' is not recognized. My end goal is to simply use the CLI to build a python database that includes some custom xml files I wrote, so I believe your method would work for python (init, index the xml files, trace-command, finalize). Does your PR need to be merged before this trace-command will work? Or is it something simple that I may be doing wrong with regards to the expression starting with '$'? And lastly, if including the xml is not at all possible, would you happen to know any other method to include custom data (maybe through something like a csv) and reference it within a python ql file? Thank you so much!

@RasmusWL
Copy link
Member

Hi @dilanbhalla, I provided an answer on to the relevant issue instead :)

@dilanbhalla
Copy link
Contributor Author

Thanks for the update @RasmusWL, and apologies for the silly mistake. Also I just added/pushed the fixes you mentioned @intrigus-lgtm, so thank you too!

@adityasharad adityasharad changed the base branch from master to main August 14, 2020 18:33
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a few comments about how to rewrite this using API graphs, which will have to take place before it can be merged. Apart from that, I think the PR looks pretty solid. 👍

Comment on lines 17 to 28
urlopen = Value::named(["urllib2.request.urlopen", "urllib.request.urlopen"]) and
http_arg = urlopen.getArgumentForCall(call, 0) and
http_arg.pointsTo(http_string) and
http_string.getText().matches("https://%") and
(
not exists(Value verify |
verify = call.getArgByName(["cafile", "capath", "context"]).pointsTo()
)
or
empty = call.getArgByName(["cafile", "capath", "context"]).pointsTo() and
empty = Value::none_()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current approach to modelling libraries is to use API graphs.

In this case, I think it should be fairly simple to rewrite the code to use API graphs. The calls you're interested in can be found using

API::moduleImport(["urllib","urllib2"]).getMember("request").getMember("urlopen").getACall()

For the value that is being passed to the named argument, I would either see if the named argument is absent altogether (in which case the default is used), or if it is present and equal to API::builtin("None").getAUse(). That should mostly replicate the behaviour you have above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, API graphs are a really cool feature! I added the changes you recommended, and reformatted the code a bit (made sure to remember the autoformatting too 👍 ).

I weirdly did not catch one of my test cases when I used API::builtin("None").getAUse() insead of comparing the value of the argument to Value::none_() however. I kept the latter method, which worked, and commented out the former. If you have any ideas why that might be the case that would be super helpful, but otherwise the method I am currently using works anyway (I will remove the comments once I get feedback on this decision). Thanks again, and let me know if there is anything else I need to fix!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's possible we have a bug in API::builtin where we fail to include things like None (since API::builtin is really a model of the builtins module, but None is kind of special). I'll see if I can make a quick fix for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5639 should fix the problem with None.

Incidentally, for this one it might be better to use getAnImmediateUse rather than getAUse. The difference here is that the latter will also take some (possibly unsound) dataflow into account, and this may lead to undesirable false negatives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tausbn new pull request is here: #5644. I used getAnImmediateUse instead and modified the test slightly to allow for this. I also kept API::builtin("None").getAUse() since you have opened a PR to fix this. 👍

@dilanbhalla
Copy link
Contributor Author

Apoligies, I believe I made a bit of a mistake. I tried to update my repo by pulling upstream changes that have been made since I last worked on this, but instead seemed to complicate things a bit. I don't want to request review unnecessarily from codeql-java and codeql-javascript, so should I just open a fresh PR and resubmit there?

@dilanbhalla dilanbhalla closed this Apr 8, 2021
@dilanbhalla
Copy link
Contributor Author

Going to resubmit in a fresh PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants