Python: promote nosql query #14070

yoff · 2023-08-28T13:10:16Z

Promotes the NoSQL injection query and updates it to modern style. Also addresses the points raised in the original review.

The approach is similar to JavaScript but, rather than having a state for just strings and a state for "control of the entire object", we have a state for just strings and a state for a tainted dictionary. This requires us to add a sanitizer for the $eq-pattern.

github-actions · 2023-08-28T13:11:33Z

QHelp previews:

python/ql/src/Security/CWE-943/NoSqlInjection.qhelp

NoSQL Injection

Passing user-controlled sources into NoSQL queries can result in a NoSQL injection flaw. This tainted NoSQL query containing a user-controlled source can then execute a malicious query in a NoSQL database such as MongoDB. In order for the user-controlled source to taint the NoSQL query, the user-controller source must be converted into a Python object using something like json.loads or xmltodict.parse.

Because a user-controlled source is passed into the query, the malicious user can have complete control over the query itself. When the tainted query is executed, the malicious user can commit malicious actions such as bypassing role restrictions or accessing and modifying restricted data in the NoSQL database.

Recommendation

NoSQL injections can be prevented by escaping user-input's special characters that are passed into the NoSQL query from the user-supplied source. Alternatively, using a sanitize library such as MongoSanitizer will ensure that user-supplied sources can not act as a malicious query.

Example

In the example below, the user-supplied source is passed to a MongoDB function that queries the MongoDB database.

from flask import Flask, request
from flask_pymongo import PyMongo
import json

mongo = PyMongo(app)


@app.route("/")
def home_page():
    unsanitized_search = request.args['search']
    json_search = json.loads(unsanitized_search)

    result = mongo.db.user.find({'name': json_search})

This can be fixed by using a sanitizer library like MongoSanitizer as shown in this annotated code version below.

from flask import Flask, request
from flask_pymongo import PyMongo
from mongosanitizer.sanitizer import sanitize
import json

mongo = PyMongo(app)


@app.route("/")
def home_page():
    unsafe_search = request.args['search']
    json_search = json.loads(unsafe_search)
    safe_search = sanitize(unsanitized_search)

    result = client.db.collection.find_one({'data': safe_search})

References

Mongoengine: Documentation.
Flask-Mongoengine: Documentation.
PyMongo: Documentation.
Flask-PyMongo: Documentation.
OWASP: NoSQL Injection.
Security Stack Exchange Discussion: Question 83231.
Common Weakness Enumeration: CWE-943.

Mostly move files, preserving authourship. This will not compile.

- Move NoSQL concepts to the non-experimental concepts file - fix references

Also use new DataFlow API

currently we do not: - recognize the pattern `{'author': {"$eq": author}}` as protected - recognize arguements to `$where` (and friends) as vulnerable

This allows us to make more precise modelling The query tests now pass. I do wonder, if there is a cleaner approach, similar to `TaintedObject` in JavaScript. I want the option to get this query in the hands of the custumors before such an investigation, though.

python/ql/lib/semmle/python/security/dataflow/NoSQLInjectionQuery.qll

`$where` and `$function` behave quite differently.

Query operators that interpret JavaScript are no longer considered sinks. Instead they are considered decodings and the output is the tainted dictionary. The state changes to `DictInput` to reflect that the user now controls a dangerous dictionary. This fixes the spurious result and moves the error reporting to a more logical place.

python/ql/lib/semmle/python/frameworks/NoSQL.qll

RasmusWL

I've looked superficially at tests, and will do a proper review of the QL code a bit later on. But here are a few things I noticed:

Can you please add tests of
$group and mapReduce? 🙏 I think these should be easy enough to model, but we should at the very least have tests to highlight how they work.

the query tests use a pattern like our examples _good.py and _bad.py. Since the _good.py version is mostly copied boilerplate setup code, I think it would be nice to merge the files into ONE file 😊

Lastly, we need a change-note saying a new query was added 👍

RasmusWL · 2023-09-13T14:11:38Z

python/ql/lib/semmle/python/Concepts.qll

@@ -378,6 +378,68 @@ module SqlExecution {
  }
 }

+/** Provides a class for modeling NoSql execution APIs. */
+module NoSqlQuery {


Did you consider naming this NoSqlExecution to match with our current SqlExecution concept? I don't disagree with the naming per-se, but I'm thinking consistency might be better here 😊

I just kept the name from the submission, but I agree that I should probably change it 👍

yoff · 2023-09-15T09:20:48Z

Can you please add tests of
$group and mapReduce?

Yes, although it appears to not be $group but rather $accumulator that is interesting, since that is the one that can contain JavaScript code (and it can appear inside other things than $group).

python/ql/lib/semmle/python/frameworks/NoSQL.qll

Claim conversions do not execute inputs in order to remove interaction with `py/unsafe-deserialization`. Co-authored-by: Rasmus Wriedt Larsen <[email protected]>

python/ql/lib/semmle/python/frameworks/PyMongo.qll

python/ql/lib/semmle/python/frameworks/BSon.qll

RasmusWL · 2023-09-29T07:50:14Z

python/ql/lib/semmle/python/frameworks/NoSQL.qll

    WhereQueryOperator() {
-      this = mongoCollection().getMember(mongoCollectionMethodName()).getACall() and
-      query = this.getParameter(0).getSubscript("$where").asSink()
+      dictionary =
+        mongoCollection().getMember(mongoCollectionMethodName()).getACall().getParameter(0) and
+      query = dictionary.getSubscript("$where").asSink() and
+      this = dictionary.asSink()


I still think we could end up with a very long jumpstep due to the way it's written right now, which I don't think is a good solution.

RasmusWL

I'm falling into the trap of bike-shedding InterpretedString, since it instantly makes me think of string-interpolation (f-strings). As I've understood this problem, we are looking at two scenarios: Where a user controls a dictionary (bad), and where user only controls a string (ok in something like db.find_one({'data': user_string}). I think the original distinction made that very clear, which was really good 👍 (If I had to choose between the two, I honestly liked the old one better.. it just had some other minor problems)

I forgot to mention in the written review that you should add the new sinks to

codeql/python/ql/src/meta/alerts/TaintSinks.ql

Line 35 in d7aea22

DataFlow::Node relevantTaintSink(string kind) {

(I only briefly mentioned this in the meeting monday)

Co-authored-by: Rasmus Wriedt Larsen <[email protected]>

yoff · 2023-09-29T09:40:17Z

As I've understood this problem, we are looking at two scenarios: Where a user controls a dictionary (bad), and where user only controls a string (ok in something like db.find_one({'data': user_string}).

I think that is actually a simplification. There is the scenario where the string gets interpreted as a dictionary (and you can write something like { $neq: 1 }), but there is also the scenario where the string is interpreted as JavaScript code. Perhaps, instead of InterpretedString something like Payload would be better?

turn "the long jump" that would end up straight at the argument into a short jump that ends up at the dictionary being written to. Dataflow takes care of the rest of the path.

Close to being a revert of github@3043633 but with slightly shorter names and added comments.

yoff · 2023-09-29T11:24:16Z

Renamed flow states again, as discussed offline.

RasmusWL · 2023-09-29T11:34:50Z

python/ql/lib/semmle/python/security/dataflow/NoSQLInjectionCustomizations.qll

-    NoSqlExecutionAsInterpretedStringSink() { this = any(NoSqlExecution noSqlExecution).getQuery() }
+  /** A NoSQL query that is vulnerable to user controlled dictionaries. */
+  class NoSqlExecutionAsDictSink extends DictSink {
+    NoSqlExecutionAsDictSink() { this = any(NoSqlExecution noSqlExecution).getQuery() }


should this ensure the sink interprets dicts?

Yes, that is an oversight, thanks!

and a fair bit of refactoring

RasmusWL

I think we're good to go for now 👍

I think we might want to look into supporting more formats than JSON (and a general approach), and might want to look into how taint should propagate through the dictionaries, but this could be done as future work 👍

yoff · 2023-09-29T12:10:48Z

Should modeling more conversions to dictionaries? Like YAML parsing?

We should at least investigate if other formats get interpreted. I will leave this as a future improvement.

github-actions bot added documentation Python labels Aug 28, 2023

yoff added 10 commits September 7, 2023 09:28

Python: prepare to promote NoSqlInjection

60dc1af

Mostly move files, preserving authourship. This will not compile.

Python: Make things compile in their new location

55707d3

- Move NoSQL concepts to the non-experimental concepts file - fix references

Python: rename file

db04597

Python: Refactor to allow customizations

087961d

Also use new DataFlow API

Python: more renames

19046ea

Python: Add inline query test

bf8bfd9

Python: Added tests based on security analysis

114984b

currently we do not: - recognize the pattern `{'author': {"$eq": author}}` as protected - recognize arguements to `$where` (and friends) as vulnerable

Python: Add QLDocs

7edebbe

Python: update test expectations

f253f97

yoff force-pushed the python/promote-nosql-query branch from 1918ccf to f253f97 Compare September 7, 2023 08:25

yoff added 2 commits September 7, 2023 15:03

Python: Follow naming convention

970e881

Python: make test PoC a proper package

b07d085

github-advanced-security bot found potential problems Sep 7, 2023

View reviewed changes

python/ql/lib/semmle/python/security/dataflow/NoSQLInjectionQuery.qll Fixed Show fixed Hide fixed

yoff added 4 commits September 8, 2023 13:37

Python: rename file

d91cd21

Python: Add test for function

154a369

Python: Split modelling of query operators

d9f63e1

`$where` and `$function` behave quite differently.

github-advanced-security bot found potential problems Sep 11, 2023

View reviewed changes

python/ql/lib/semmle/python/frameworks/NoSQL.qll Fixed Show fixed Hide fixed

python/ql/lib/semmle/python/frameworks/NoSQL.qll Fixed Show fixed Hide fixed

yoff marked this pull request as ready for review September 11, 2023 14:46

yoff requested a review from a team as a code owner September 11, 2023 14:46

RasmusWL self-requested a review September 13, 2023 13:56

RasmusWL requested changes Sep 13, 2023

View reviewed changes

yoff added 2 commits September 18, 2023 14:34

Python: add change note

4614b1a

Python: add test for $accumulator

5611bda

github-advanced-security bot found potential problems Sep 19, 2023

View reviewed changes

python/ql/lib/semmle/python/frameworks/NoSQL.qll Fixed Show fixed Hide fixed

yoff and others added 4 commits September 28, 2023 12:40

Apply suggestions from code review

c2b6383

Claim conversions do not execute inputs in order to remove interaction with `py/unsafe-deserialization`. Co-authored-by: Rasmus Wriedt Larsen <[email protected]>

Python: rename file

9682c82

Python: rename module

2a739b3

Python: split modelling

eb1be08

github-advanced-security bot found potential problems Sep 28, 2023

View reviewed changes

yoff added 4 commits September 28, 2023 13:35

Python: Fix QL alerts

2a7b593

Python: forgot to list framework

a8e0023

Python: update test expectations

d5b64c5

Python: Some renaming of flow states

3043633

yoff requested a review from RasmusWL September 28, 2023 12:28

RasmusWL reviewed Sep 29, 2023

View reviewed changes

RasmusWL requested changes Sep 29, 2023

View reviewed changes

Apply suggestions from code review

2e028a4

Co-authored-by: Rasmus Wriedt Larsen <[email protected]>

yoff added 3 commits September 29, 2023 12:02

Python: update meta query TaintSinks

74d6f37

Python: nicer paths

2d845e3

turn "the long jump" that would end up straight at the argument into a short jump that ends up at the dictionary being written to. Dataflow takes care of the rest of the path.

Python: fix QL alert

e170805

yoff requested a review from RasmusWL September 29, 2023 10:07

Python: rename flow states

f3a0161

Close to being a revert of github@3043633 but with slightly shorter names and added comments.

RasmusWL reviewed Sep 29, 2023

View reviewed changes

yoff and others added 6 commits September 29, 2023 13:45

Python: require dict sinks be dangerous.

9769668

Python: NoSQLInjection -> NoSqlInjection

16e1a00

Python: List NoSQL injection sinks

d7ad5a0

Python: Clean trailing whitespace

3676262

Python: -> NoSQL in QLDocs

d6d13f8

Python: Add keyword argument support

9b73bbf

and a fair bit of refactoring

RasmusWL approved these changes Sep 29, 2023

View reviewed changes

yoff merged commit dbecb1b into github:main Sep 29, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: promote nosql query #14070

Python: promote nosql query #14070

yoff commented Aug 28, 2023 •

edited

Loading

github-actions bot commented Aug 28, 2023 •

edited

Loading

NoSQL Injection

Recommendation

Example

References

RasmusWL left a comment

RasmusWL Sep 13, 2023

yoff Sep 15, 2023

yoff commented Sep 15, 2023

RasmusWL Sep 29, 2023

RasmusWL left a comment

yoff commented Sep 29, 2023

yoff commented Sep 29, 2023

RasmusWL Sep 29, 2023

yoff Sep 29, 2023

RasmusWL left a comment

yoff commented Sep 29, 2023

Python: promote nosql query #14070

Python: promote nosql query #14070

Conversation

yoff commented Aug 28, 2023 • edited Loading

github-actions bot commented Aug 28, 2023 • edited Loading

NoSQL Injection

Recommendation

Example

References

RasmusWL left a comment

Choose a reason for hiding this comment

RasmusWL Sep 13, 2023

Choose a reason for hiding this comment

yoff Sep 15, 2023

Choose a reason for hiding this comment

yoff commented Sep 15, 2023

RasmusWL Sep 29, 2023

Choose a reason for hiding this comment

RasmusWL left a comment

Choose a reason for hiding this comment

yoff commented Sep 29, 2023

yoff commented Sep 29, 2023

RasmusWL Sep 29, 2023

Choose a reason for hiding this comment

yoff Sep 29, 2023

Choose a reason for hiding this comment

RasmusWL left a comment

Choose a reason for hiding this comment

yoff commented Sep 29, 2023

yoff commented Aug 28, 2023 •

edited

Loading

github-actions bot commented Aug 28, 2023 •

edited

Loading