Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: promote nosql query #14070

Merged
merged 45 commits into from
Sep 29, 2023
Merged

Python: promote nosql query #14070

merged 45 commits into from
Sep 29, 2023

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Aug 28, 2023

Promotes the NoSQL injection query and updates it to modern style. Also addresses the points raised in the original review.

The approach is similar to JavaScript but, rather than having a state for just strings and a state for "control of the entire object", we have a state for just strings and a state for a tainted dictionary. This requires us to add a sanitizer for the $eq-pattern.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 28, 2023

QHelp previews:

python/ql/src/Security/CWE-943/NoSqlInjection.qhelp

NoSQL Injection

Passing user-controlled sources into NoSQL queries can result in a NoSQL injection flaw. This tainted NoSQL query containing a user-controlled source can then execute a malicious query in a NoSQL database such as MongoDB. In order for the user-controlled source to taint the NoSQL query, the user-controller source must be converted into a Python object using something like json.loads or xmltodict.parse.

Because a user-controlled source is passed into the query, the malicious user can have complete control over the query itself. When the tainted query is executed, the malicious user can commit malicious actions such as bypassing role restrictions or accessing and modifying restricted data in the NoSQL database.

Recommendation

NoSQL injections can be prevented by escaping user-input's special characters that are passed into the NoSQL query from the user-supplied source. Alternatively, using a sanitize library such as MongoSanitizer will ensure that user-supplied sources can not act as a malicious query.

Example

In the example below, the user-supplied source is passed to a MongoDB function that queries the MongoDB database.

from flask import Flask, request
from flask_pymongo import PyMongo
import json

mongo = PyMongo(app)


@app.route("/")
def home_page():
    unsanitized_search = request.args['search']
    json_search = json.loads(unsanitized_search)

    result = mongo.db.user.find({'name': json_search})

This can be fixed by using a sanitizer library like MongoSanitizer as shown in this annotated code version below.

from flask import Flask, request
from flask_pymongo import PyMongo
from mongosanitizer.sanitizer import sanitize
import json

mongo = PyMongo(app)


@app.route("/")
def home_page():
    unsafe_search = request.args['search']
    json_search = json.loads(unsafe_search)
    safe_search = sanitize(unsanitized_search)

    result = client.db.collection.find_one({'data': safe_search})

References

Mostly move files, preserving authourship.
This will not compile.
- Move NoSQL concepts to the non-experimental concepts file
- fix references
Also use new DataFlow API
currently we do not:
- recognize the pattern
   `{'author': {"$eq": author}}` as protected
- recognize arguements to `$where` (and friends)
   as vulnerable
This allows us to make more precise modelling
The query tests now pass.
I do wonder, if there is a cleaner approach, similar to
`TaintedObject` in JavaScript. I want the option to
get this query in the hands of the custumors before
such an investigation, though.
`$where` and `$function` behave quite differently.
Query operators that interpret JavaScript
are no longer considered sinks.
Instead they are considered decodings
and the output is the tainted dictionary.
The state changes to `DictInput` to reflect
that the user now controls a dangerous dictionary.

This fixes the spurious result and moves the error reporting
to a more logical place.
@yoff yoff marked this pull request as ready for review September 11, 2023 14:46
@yoff yoff requested a review from a team as a code owner September 11, 2023 14:46
@RasmusWL RasmusWL self-requested a review September 13, 2023 13:56
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked superficially at tests, and will do a proper review of the QL code a bit later on. But here are a few things I noticed:

Can you please add tests of
$group and mapReduce? 🙏 I think these should be easy enough to model, but we should at the very least have tests to highlight how they work.

the query tests use a pattern like our examples _good.py and _bad.py. Since the _good.py version is mostly copied boilerplate setup code, I think it would be nice to merge the files into ONE file 😊

Lastly, we need a change-note saying a new query was added 👍

@@ -378,6 +378,68 @@ module SqlExecution {
}
}

/** Provides a class for modeling NoSql execution APIs. */
module NoSqlQuery {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider naming this NoSqlExecution to match with our current SqlExecution concept? I don't disagree with the naming per-se, but I'm thinking consistency might be better here 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just kept the name from the submission, but I agree that I should probably change it 👍

@yoff
Copy link
Contributor Author

yoff commented Sep 15, 2023

Can you please add tests of
$group and mapReduce?

Yes, although it appears to not be $group but rather $accumulator that is interesting, since that is the one that can contain JavaScript code (and it can appear inside other things than $group).

yoff and others added 4 commits September 28, 2023 12:40
Claim conversions do not execute inputs in order to remove interaction with `py/unsafe-deserialization`.

Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
@yoff yoff requested a review from RasmusWL September 28, 2023 12:28
python/ql/lib/semmle/python/frameworks/BSon.qll Outdated Show resolved Hide resolved
Comment on lines 130 to 134
WhereQueryOperator() {
this = mongoCollection().getMember(mongoCollectionMethodName()).getACall() and
query = this.getParameter(0).getSubscript("$where").asSink()
dictionary =
mongoCollection().getMember(mongoCollectionMethodName()).getACall().getParameter(0) and
query = dictionary.getSubscript("$where").asSink() and
this = dictionary.asSink()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we could end up with a very long jumpstep due to the way it's written right now, which I don't think is a good solution.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm falling into the trap of bike-shedding InterpretedString, since it instantly makes me think of string-interpolation (f-strings). As I've understood this problem, we are looking at two scenarios: Where a user controls a dictionary (bad), and where user only controls a string (ok in something like db.find_one({'data': user_string}). I think the original distinction made that very clear, which was really good 👍 (If I had to choose between the two, I honestly liked the old one better.. it just had some other minor problems)

I forgot to mention in the written review that you should add the new sinks to

DataFlow::Node relevantTaintSink(string kind) {
(I only briefly mentioned this in the meeting monday)

Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
@yoff
Copy link
Contributor Author

yoff commented Sep 29, 2023

As I've understood this problem, we are looking at two scenarios: Where a user controls a dictionary (bad), and where user only controls a string (ok in something like db.find_one({'data': user_string}).

I think that is actually a simplification. There is the scenario where the string gets interpreted as a dictionary (and you can write something like { $neq: 1 }), but there is also the scenario where the string is interpreted as JavaScript code. Perhaps, instead of InterpretedString something like Payload would be better?

turn "the long jump" that would end up
straight at the argument into a short jump
that ends up at the dictionary being written to.
Dataflow takes care of the rest of the path.
@yoff yoff requested a review from RasmusWL September 29, 2023 10:07
Close to being a revert of
github@3043633
but with slightly shorter names and added comments.
@yoff
Copy link
Contributor Author

yoff commented Sep 29, 2023

Renamed flow states again, as discussed offline.

NoSqlExecutionAsInterpretedStringSink() { this = any(NoSqlExecution noSqlExecution).getQuery() }
/** A NoSQL query that is vulnerable to user controlled dictionaries. */
class NoSqlExecutionAsDictSink extends DictSink {
NoSqlExecutionAsDictSink() { this = any(NoSqlExecution noSqlExecution).getQuery() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this ensure the sink interprets dicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is an oversight, thanks!

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to go for now 👍

I think we might want to look into supporting more formats than JSON (and a general approach), and might want to look into how taint should propagate through the dictionaries, but this could be done as future work 👍

@yoff
Copy link
Contributor Author

yoff commented Sep 29, 2023

Should modeling more conversions to dictionaries? Like YAML parsing?

We should at least investigate if other formats get interpreted. I will leave this as a future improvement.

@yoff yoff merged commit dbecb1b into github:main Sep 29, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants