-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Flask & Django Constant Secret Key initialization #13561
Python: Flask & Django Constant Secret Key initialization #13561
Conversation
Hi @RasmusWL , if you think it's better to I write this query with a better structure please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like these queries, have thought about writing something like this myself 👍
Can you please submit the security lab bounty issue now as well? 🙏
However, I think the best practice is to load the SECRET_KEY from the environment (without a default value of course). This is for example the recommended practice in the Django security docs: https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/#secret-key
python/ql/test/experimental/query-tests/Security/CWE-287-ConstantSecretKey/app_unsafe.py
Show resolved
Hide resolved
@@ -0,0 +1,22 @@ | |||
from flask import Flask, session | |||
from flask_session import Session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you highlight for me how flask_session
makes this safe? (preferably in a code comment so it doesn't get lost)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I decided to not include this sanitizer because this library was suggested in a blog post I added that as sanitizer.
after putting more time on this, this is not guarantied( or I'm not sure about it) to use this lib if the SECRET_KEY is not safe.
// this can be ideal if we assume that best security practice is that | ||
// we don't get SECRET_KEY from env and we always assign a secure generated random string to it | ||
cn.getNumArgument() = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already outlined, I don't think this is best security practice. Do you have any resources to back up your claims? 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I think the best practice can be getting secret_key from the environment (without a default value) AND check whether it is none or not, I wanted to do this but my query performance is really low currently because of two Global DataFlow Configuration.
also the link from this blog post we have
We also included two other SECRET_KEYs we found, one in a deployment template, thisISaSECRET_1234, and another in the documentation YOUR_OWN_RANDOM_GENERATED_SECRET_KEY.
and in the below chart of the blog post you can see that the number of instances that didn't even change the document or configuration keys!
so the best practice in my opinion is that get secret key from config and check whether it is empty or same as default or not in case of not generating it automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/#secret-key
I think Django Doc is not really good about open source projects because if we assume we have an open source project then there is a possibility that developers and users can make a mistake about changing secret keys or documenting their configuration well, also according to doc there is a constant secure key which in open source it is constant and no matter it is random.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also some libs that load env vars from file and then they can be accessed by os.getenv
, ... so there is a possible constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amammad Sorry for the late response, I completely missed this thread.
Storing secret keys in environment variables is a globally recognized good practice. Loading those variables from local files completely defeat the purpose of adding them to the OS environment at boot time. However, from a static analysis point of view, we dont have visibility on where the env vars are loaded from and it is ok to assume that loading secrets from env vars is a safe approach.
Reporting issues/vulnerabilities for each case where secrets are loaded from env vars will lead to too many FPs. Please adjust the query to not report those issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwntester no worries!
what about the default constants of these env vars methods? should I consider these default values too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default values should be reported since they are potential hardcoded credentials
sink = | ||
[ | ||
n.getReturn().getAMember().getSubscript(["SECRET_KEY", "JWT_SECRET_KEY"]).asSink(), | ||
n.getReturn().getMember(["SECRET_KEY", "JWT_SECRET_KEY"]).asSink(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per https://flask.palletsprojects.com/en/2.3.x/api/#flask.Flask.secret_key
n.getReturn().getMember(["SECRET_KEY", "JWT_SECRET_KEY"]).asSink(), | |
n.getReturn().getMember(["secret_key", "JWT_SECRET_KEY"]).asSink(), |
Although I'm not sure this will cover the value assigned in app.secret_key = <bad>
, I think you would be better off using DataFlow::AttrWrite attr
, and use the value that is being assigned as the sink 😊
* @name Initializing SECRET_KEY of Flask application with Constant value | ||
* @description Initializing SECRET_KEY of Flask application with Constant value | ||
* files can lead to Authentication bypass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since query also covers Django, title should be updated (or query should have own file)
// this query checks for Django SecretKey too | ||
if exists(API::moduleImport("django")) | ||
then | ||
exists(AssignStmt e | e.getTarget(0).toString() = "SECRET_KEY" | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No use of .toString()
in production code.
You probably want something like .(Name).getId() = "SECRET_KEY"
// *it seems that sanitizer have a lot of performance issues* | ||
// for case check whether SECRECT_KEY is empty or not | ||
predicate sanitizer(Expr sourceExpr) { | ||
exists(DataFlow::Node source, DataFlow::Node sink, If i | | ||
source.asExpr() = sourceExpr and | ||
DataFlow::localFlow(source, sink) | ||
| | ||
not i.getASubExpression().getAChildNode*().(Compare) = sink.asExpr() and | ||
not sink.getScope().getLocation().getFile().inStdlib() and | ||
not source.getScope().getLocation().getFile().inStdlib() and | ||
not i.getScope().getLocation().getFile().inStdlib() | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate seems to select any expression that is not locally used in an if expression. If you only want to apply it for assignments of SECRECT_KEY
, start by being able to select these in a predicate, and then ONLY look for which of these can flow to an if statement.
However, configurations also have a predicate called isSanitizer
, which you should be able to use for this (instead of restricting the sources).
As highlighted in the comment for [app_safe_2.py](https://github.com/github/codeql/pull/13561/files#diff-0427b14b91512fc56afbdbe6b40f05ef99a40f497ae599c01e9ca95fb1d2d3c6)
I don't think the fact that we have an if statement makes this safe either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using DataFlow::ConfigSig
I can't find isSanitizer
predicate, also i'm not familiar with sanitizers a lot, do these predicates sanitize a node out of between source and sink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a need of sanitizer because one of the CVEs related to this query did a sanitize that whether the SECRET_KEY value after getting env variable or config value is same as the default constant or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are using the newest version of the API (great!) the predicate is called isBarrier
.
A barrier is a node that does not allow flow to continue (in fact, it is removed from the flow graph). The comparison with a known safe value would be a prototypical example of a barrier. In fact, we have in our libraries an implementation of "comparison with a constant" since that is a barrier in many queries.
You might be able to adapt the code by using just the first part, handling eq
and noteq
and replacing the string constant with the node for the default value.
In your case, you have a comparison with a known unsafe value, so if you try to adapt the code to you should flip the true and false branches.
| | ||
config.hasFlow(n1, _) and | ||
n1.asExpr().isConstant() and | ||
fileNamehelper = n1.asExpr().(StrConst).getS() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fileNamehelper = n1.asExpr().(StrConst).getS() and | |
fileNamehelper = n1.asExpr().(StrConst).getText() and |
// using flask_session library is safe | ||
predicate flask_sessionSanitizer(DataFlow::Node source) { | ||
not DataFlow::localFlow(source, | ||
API::moduleImport("flask_session").getMember("Session").getACall().getArg(0)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate is defined the "wrong way around" for performance.
It holds for all dataflow nodes that doesn't flow to first argument of flask_session.Session()
, so will hold for a large number of tuples. Instead define it for only the ones that do have flow, and then use it as not flask_sessionSanitizer(app)
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always thought there is no difference between not
before DataFlow and not
before any caller predicate of it ??
now I understand :)
/** | ||
* Assignments like `SECRET_KEY = ConstantValue` | ||
* which ConstantValue will be found by another DataFlow Configuration | ||
* and `SECRET_KEY` location must be a argument of `from_object` or `from_pyfile` methods | ||
* the argument/location value will be found by another Taint Tracking Configuration. | ||
*/ | ||
class SecretKeyAssignStmt extends AssignStmt { | ||
SecretKeyAssignStmt() { | ||
exists( | ||
string configFileName, string fileNamehelper, DataFlow::Node n1, FromObjectFileName config | ||
| | ||
config.hasFlow(n1, _) and | ||
n1.asExpr().isConstant() and | ||
fileNamehelper = n1.asExpr().(StrConst).getS() and | ||
// because of `from_object` we want first part of `Config.AClassName` which `Config` is a python file name | ||
configFileName = fileNamehelper.splitAt(".") and | ||
// after spliting, don't look at %py% pattern | ||
configFileName != "py" | ||
| | ||
this.getLocation().getFile().getShortName().matches("%" + configFileName + "%") and | ||
this.getTarget(0).toString() = ["SECRET_KEY", "JWT_SECRET_KEY"] | ||
) and | ||
not this.getScope().getLocation().getFile().inStdlib() | ||
} | ||
} | ||
|
||
/** | ||
* we have some file name that telling us the SECRET_KEY location | ||
* which have determined by these two methods | ||
* `app.config.from_pyfile("configFileName.py")` or `app.config.from_object("configFileName.ClassName")` | ||
* this is a helper configuration that help us skip the SECRET_KEY variables that are not related to Flask. | ||
*/ | ||
class FromObjectFileName extends TaintTracking::Configuration { | ||
FromObjectFileName() { this = "FromObjectFileName" } | ||
|
||
override predicate isSource(DataFlow::Node source) { | ||
source.asExpr().isConstant() and | ||
not source.getScope().getLocation().getFile().inStdlib() | ||
} | ||
|
||
override predicate isSink(DataFlow::Node sink) { | ||
exists(API::Node n | | ||
n = flaskInstance() and | ||
flask_sessionSanitizer(n.getReturn().asSource()) | ||
| | ||
sink = | ||
n.getReturn() | ||
.getMember("config") | ||
.getMember(["from_object", "from_pyfile"]) | ||
.getACall() | ||
.getArg(0) | ||
) and | ||
not sink.getScope().getLocation().getFile().inStdlib() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this part could essentially be replaced by something like the following:
from API::Node app, API::CallNode cn, string filenameUsed
where
app = API::moduleImport("flask").getMember("Flask").getASubclass*().getACall().getReturn() and
// your sanitizer for Session
cn = app.getMember("config")
.getMember(["from_object", "from_pyfile"])
.getACall() and
filenameUsed = cn
.getParameter(0)
.getAValueReachingSink()
.asExpr().(StrConst).getText()
select cn, filenameUsed
Hi @RasmusWL I opened another pull request because of my mistake :( |
Also I'm commenting here that I moved this pull request to here |
Is this fixed now? |
Everyone will be on vacation until end of next week, but it seems you are close :-) |
Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK to merge now, will have to look things over a bit when we promote this query, especially the sanitizer around flask_session
.
This means tests can pass on any machine now 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments on previous approving review
@RasmusWL I did variant analysis and it seems there are many false positives because of the test projects, can we do a sanitize that the path don't contain any test/example keyword? |
@amammad sure, you can use the following piece of if you want to
|
@RasmusWL thanks I tried this just now, but it seems it is not effective, I've tested following and it seems there are much fewer false positives related to example/test/demo code examples. predicate isBarrier(DataFlow::Node node) {
node.getLocation().getFile().inStdlib() or
node.getLocation().getFile().getAbsolutePath().matches(["%test%", "%demo%", "%example%"])
} |
@amammad it's your choice. In the future though, I would appreciate if you made such changes before making the PR 😊 |
Had to retrigger CI, so just merged in main 😊 |
@RasmusWL I'm so sorry I should have checked the tests. :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future me: see comments on previous approving review
This is part of All for one, one for all query submission, I'm going to submit an issue in github/securitylab for this pull request too.
the major differences between some secret finder tools that working with regex and this query are:
I tried my best to write sanitizers, i think this query need modularization which need separate queries for Django and Flask and need to put some common classes and predicates in a library file. I'll do this in this week if you agree or in another pull request in future.