-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: promote nosql query #14070
Python: promote nosql query #14070
Conversation
QHelp previews: python/ql/src/Security/CWE-943/NoSqlInjection.qhelpNoSQL InjectionPassing user-controlled sources into NoSQL queries can result in a NoSQL injection flaw. This tainted NoSQL query containing a user-controlled source can then execute a malicious query in a NoSQL database such as MongoDB. In order for the user-controlled source to taint the NoSQL query, the user-controller source must be converted into a Python object using something like Because a user-controlled source is passed into the query, the malicious user can have complete control over the query itself. When the tainted query is executed, the malicious user can commit malicious actions such as bypassing role restrictions or accessing and modifying restricted data in the NoSQL database. RecommendationNoSQL injections can be prevented by escaping user-input's special characters that are passed into the NoSQL query from the user-supplied source. Alternatively, using a sanitize library such as MongoSanitizer will ensure that user-supplied sources can not act as a malicious query. ExampleIn the example below, the user-supplied source is passed to a MongoDB function that queries the MongoDB database. from flask import Flask, request
from flask_pymongo import PyMongo
import json
mongo = PyMongo(app)
@app.route("/")
def home_page():
unsanitized_search = request.args['search']
json_search = json.loads(unsanitized_search)
result = mongo.db.user.find({'name': json_search}) This can be fixed by using a sanitizer library like MongoSanitizer as shown in this annotated code version below. from flask import Flask, request
from flask_pymongo import PyMongo
from mongosanitizer.sanitizer import sanitize
import json
mongo = PyMongo(app)
@app.route("/")
def home_page():
unsafe_search = request.args['search']
json_search = json.loads(unsafe_search)
safe_search = sanitize(unsanitized_search)
result = client.db.collection.find_one({'data': safe_search}) References
|
Mostly move files, preserving authourship. This will not compile.
- Move NoSQL concepts to the non-experimental concepts file - fix references
Also use new DataFlow API
currently we do not: - recognize the pattern `{'author': {"$eq": author}}` as protected - recognize arguements to `$where` (and friends) as vulnerable
This allows us to make more precise modelling The query tests now pass. I do wonder, if there is a cleaner approach, similar to `TaintedObject` in JavaScript. I want the option to get this query in the hands of the custumors before such an investigation, though.
1918ccf
to
f253f97
Compare
`$where` and `$function` behave quite differently.
Query operators that interpret JavaScript are no longer considered sinks. Instead they are considered decodings and the output is the tainted dictionary. The state changes to `DictInput` to reflect that the user now controls a dangerous dictionary. This fixes the spurious result and moves the error reporting to a more logical place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked superficially at tests, and will do a proper review of the QL code a bit later on. But here are a few things I noticed:
Can you please add tests of
$group
and mapReduce? 🙏 I think these should be easy enough to model, but we should at the very least have tests to highlight how they work.
the query tests use a pattern like our examples _good.py
and _bad.py
. Since the _good.py
version is mostly copied boilerplate setup code, I think it would be nice to merge the files into ONE file 😊
Lastly, we need a change-note saying a new query was added 👍
@@ -378,6 +378,68 @@ module SqlExecution { | |||
} | |||
} | |||
|
|||
/** Provides a class for modeling NoSql execution APIs. */ | |||
module NoSqlQuery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider naming this NoSqlExecution
to match with our current SqlExecution
concept? I don't disagree with the naming per-se, but I'm thinking consistency might be better here 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just kept the name from the submission, but I agree that I should probably change it 👍
Claim conversions do not execute inputs in order to remove interaction with `py/unsafe-deserialization`. Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
WhereQueryOperator() { | ||
this = mongoCollection().getMember(mongoCollectionMethodName()).getACall() and | ||
query = this.getParameter(0).getSubscript("$where").asSink() | ||
dictionary = | ||
mongoCollection().getMember(mongoCollectionMethodName()).getACall().getParameter(0) and | ||
query = dictionary.getSubscript("$where").asSink() and | ||
this = dictionary.asSink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we could end up with a very long jumpstep due to the way it's written right now, which I don't think is a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm falling into the trap of bike-shedding InterpretedString
, since it instantly makes me think of string-interpolation (f-strings). As I've understood this problem, we are looking at two scenarios: Where a user controls a dictionary (bad), and where user only controls a string (ok in something like db.find_one({'data': user_string})
. I think the original distinction made that very clear, which was really good 👍 (If I had to choose between the two, I honestly liked the old one better.. it just had some other minor problems)
I forgot to mention in the written review that you should add the new sinks to
DataFlow::Node relevantTaintSink(string kind) { |
Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
I think that is actually a simplification. There is the scenario where the string gets interpreted as a dictionary (and you can write something like |
turn "the long jump" that would end up straight at the argument into a short jump that ends up at the dictionary being written to. Dataflow takes care of the rest of the path.
Close to being a revert of github@3043633 but with slightly shorter names and added comments.
Renamed flow states again, as discussed offline. |
NoSqlExecutionAsInterpretedStringSink() { this = any(NoSqlExecution noSqlExecution).getQuery() } | ||
/** A NoSQL query that is vulnerable to user controlled dictionaries. */ | ||
class NoSqlExecutionAsDictSink extends DictSink { | ||
NoSqlExecutionAsDictSink() { this = any(NoSqlExecution noSqlExecution).getQuery() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this ensure the sink interprets dicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is an oversight, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good to go for now 👍
I think we might want to look into supporting more formats than JSON (and a general approach), and might want to look into how taint should propagate through the dictionaries, but this could be done as future work 👍
We should at least investigate if other formats get interpreted. I will leave this as a future improvement. |
Promotes the NoSQL injection query and updates it to modern style. Also addresses the points raised in the original review.
The approach is similar to JavaScript but, rather than having a state for just strings and a state for "control of the entire object", we have a state for just strings and a state for a tainted dictionary. This requires us to add a sanitizer for the
$eq
-pattern.