Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Flask & Django Constant Secret Key initialization #13561

Merged
merged 28 commits into from
Aug 21, 2023

Conversation

am0o0
Copy link
Contributor

@am0o0 am0o0 commented Jun 25, 2023

This is part of All for one, one for all query submission, I'm going to submit an issue in github/securitylab for this pull request too.
the major differences between some secret finder tools that working with regex and this query are:

  1. not all of secrets can be found by regex
  2. most of them assigned in multiple steps and don't have a same name like SECRET_KEY
  3. precision.

I tried my best to write sanitizers, i think this query need modularization which need separate queries for Django and Flask and need to put some common classes and predicates in a library file. I'll do this in this week if you agree or in another pull request in future.

@am0o0 am0o0 requested a review from a team as a code owner June 25, 2023 10:39
@am0o0 am0o0 changed the title Python: Flask & Django Constant Key initialization Python: Flask & Django Constant Secret Key initialization Jun 25, 2023
@am0o0
Copy link
Contributor Author

am0o0 commented Jun 26, 2023

Hi @RasmusWL , if you think it's better to I write this query with a better structure please let me know.
Also I've tested this query against apache superset and airflow and I didn't observe any significant performance issues in these repositories.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like these queries, have thought about writing something like this myself 👍

Can you please submit the security lab bounty issue now as well? 🙏

However, I think the best practice is to load the SECRET_KEY from the environment (without a default value of course). This is for example the recommended practice in the Django security docs: https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/#secret-key

@@ -0,0 +1,22 @@
from flask import Flask, session
from flask_session import Session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you highlight for me how flask_session makes this safe? (preferably in a code comment so it doesn't get lost)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I decided to not include this sanitizer because this library was suggested in a blog post I added that as sanitizer.
after putting more time on this, this is not guarantied( or I'm not sure about it) to use this lib if the SECRET_KEY is not safe.

Comment on lines 67 to 69
// this can be ideal if we assume that best security practice is that
// we don't get SECRET_KEY from env and we always assign a secure generated random string to it
cn.getNumArgument() = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already outlined, I don't think this is best security practice. Do you have any resources to back up your claims? 😊

Copy link
Contributor Author

@am0o0 am0o0 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I think the best practice can be getting secret_key from the environment (without a default value) AND check whether it is none or not, I wanted to do this but my query performance is really low currently because of two Global DataFlow Configuration.
also the link from this blog post we have

We also included two other SECRET_KEYs we found, one in a deployment template, thisISaSECRET_1234, and another in the documentation YOUR_OWN_RANDOM_GENERATED_SECRET_KEY.

and in the below chart of the blog post you can see that the number of instances that didn't even change the document or configuration keys!

so the best practice in my opinion is that get secret key from config and check whether it is empty or same as default or not in case of not generating it automatically.

Copy link
Contributor Author

@am0o0 am0o0 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/#secret-key

I think Django Doc is not really good about open source projects because if we assume we have an open source project then there is a possibility that developers and users can make a mistake about changing secret keys or documenting their configuration well, also according to doc there is a constant secure key which in open source it is constant and no matter it is random.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also some libs that load env vars from file and then they can be accessed by os.getenv , ... so there is a possible constant value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amammad Sorry for the late response, I completely missed this thread.

Storing secret keys in environment variables is a globally recognized good practice. Loading those variables from local files completely defeat the purpose of adding them to the OS environment at boot time. However, from a static analysis point of view, we dont have visibility on where the env vars are loaded from and it is ok to assume that loading secrets from env vars is a safe approach.

Reporting issues/vulnerabilities for each case where secrets are loaded from env vars will lead to too many FPs. Please adjust the query to not report those issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwntester no worries!
what about the default constants of these env vars methods? should I consider these default values too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default values should be reported since they are potential hardcoded credentials

sink =
[
n.getReturn().getAMember().getSubscript(["SECRET_KEY", "JWT_SECRET_KEY"]).asSink(),
n.getReturn().getMember(["SECRET_KEY", "JWT_SECRET_KEY"]).asSink(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per https://flask.palletsprojects.com/en/2.3.x/api/#flask.Flask.secret_key

Suggested change
n.getReturn().getMember(["SECRET_KEY", "JWT_SECRET_KEY"]).asSink(),
n.getReturn().getMember(["secret_key", "JWT_SECRET_KEY"]).asSink(),

Although I'm not sure this will cover the value assigned in app.secret_key = <bad>, I think you would be better off using DataFlow::AttrWrite attr, and use the value that is being assigned as the sink 😊

Comment on lines 2 to 4
* @name Initializing SECRET_KEY of Flask application with Constant value
* @description Initializing SECRET_KEY of Flask application with Constant value
* files can lead to Authentication bypass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since query also covers Django, title should be updated (or query should have own file)

// this query checks for Django SecretKey too
if exists(API::moduleImport("django"))
then
exists(AssignStmt e | e.getTarget(0).toString() = "SECRET_KEY" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No use of .toString() in production code.

You probably want something like .(Name).getId() = "SECRET_KEY"

Comment on lines 142 to 154
// *it seems that sanitizer have a lot of performance issues*
// for case check whether SECRECT_KEY is empty or not
predicate sanitizer(Expr sourceExpr) {
exists(DataFlow::Node source, DataFlow::Node sink, If i |
source.asExpr() = sourceExpr and
DataFlow::localFlow(source, sink)
|
not i.getASubExpression().getAChildNode*().(Compare) = sink.asExpr() and
not sink.getScope().getLocation().getFile().inStdlib() and
not source.getScope().getLocation().getFile().inStdlib() and
not i.getScope().getLocation().getFile().inStdlib()
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate seems to select any expression that is not locally used in an if expression. If you only want to apply it for assignments of SECRECT_KEY, start by being able to select these in a predicate, and then ONLY look for which of these can flow to an if statement.

However, configurations also have a predicate called isSanitizer, which you should be able to use for this (instead of restricting the sources).

As highlighted in the comment for [app_safe_2.py](https://github.com/github/codeql/pull/13561/files#diff-0427b14b91512fc56afbdbe6b40f05ef99a40f497ae599c01e9ca95fb1d2d3c6) I don't think the fact that we have an if statement makes this safe either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using DataFlow::ConfigSig I can't find isSanitizer predicate, also i'm not familiar with sanitizers a lot, do these predicates sanitize a node out of between source and sink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a need of sanitizer because one of the CVEs related to this query did a sanitize that whether the SECRET_KEY value after getting env variable or config value is same as the default constant or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using the newest version of the API (great!) the predicate is called isBarrier.
A barrier is a node that does not allow flow to continue (in fact, it is removed from the flow graph). The comparison with a known safe value would be a prototypical example of a barrier. In fact, we have in our libraries an implementation of "comparison with a constant" since that is a barrier in many queries.

You might be able to adapt the code by using just the first part, handling eq and noteq and replacing the string constant with the node for the default value.

In your case, you have a comparison with a known unsafe value, so if you try to adapt the code to you should flip the true and false branches.

|
config.hasFlow(n1, _) and
n1.asExpr().isConstant() and
fileNamehelper = n1.asExpr().(StrConst).getS() and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileNamehelper = n1.asExpr().(StrConst).getS() and
fileNamehelper = n1.asExpr().(StrConst).getText() and

Comment on lines 136 to 140
// using flask_session library is safe
predicate flask_sessionSanitizer(DataFlow::Node source) {
not DataFlow::localFlow(source,
API::moduleImport("flask_session").getMember("Session").getACall().getArg(0))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate is defined the "wrong way around" for performance.

It holds for all dataflow nodes that doesn't flow to first argument of flask_session.Session(), so will hold for a large number of tuples. Instead define it for only the ones that do have flow, and then use it as not flask_sessionSanitizer(app) 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always thought there is no difference between not before DataFlow and not before any caller predicate of it ??
now I understand :)

Comment on lines 156 to 210
/**
* Assignments like `SECRET_KEY = ConstantValue`
* which ConstantValue will be found by another DataFlow Configuration
* and `SECRET_KEY` location must be a argument of `from_object` or `from_pyfile` methods
* the argument/location value will be found by another Taint Tracking Configuration.
*/
class SecretKeyAssignStmt extends AssignStmt {
SecretKeyAssignStmt() {
exists(
string configFileName, string fileNamehelper, DataFlow::Node n1, FromObjectFileName config
|
config.hasFlow(n1, _) and
n1.asExpr().isConstant() and
fileNamehelper = n1.asExpr().(StrConst).getS() and
// because of `from_object` we want first part of `Config.AClassName` which `Config` is a python file name
configFileName = fileNamehelper.splitAt(".") and
// after spliting, don't look at %py% pattern
configFileName != "py"
|
this.getLocation().getFile().getShortName().matches("%" + configFileName + "%") and
this.getTarget(0).toString() = ["SECRET_KEY", "JWT_SECRET_KEY"]
) and
not this.getScope().getLocation().getFile().inStdlib()
}
}

/**
* we have some file name that telling us the SECRET_KEY location
* which have determined by these two methods
* `app.config.from_pyfile("configFileName.py")` or `app.config.from_object("configFileName.ClassName")`
* this is a helper configuration that help us skip the SECRET_KEY variables that are not related to Flask.
*/
class FromObjectFileName extends TaintTracking::Configuration {
FromObjectFileName() { this = "FromObjectFileName" }

override predicate isSource(DataFlow::Node source) {
source.asExpr().isConstant() and
not source.getScope().getLocation().getFile().inStdlib()
}

override predicate isSink(DataFlow::Node sink) {
exists(API::Node n |
n = flaskInstance() and
flask_sessionSanitizer(n.getReturn().asSource())
|
sink =
n.getReturn()
.getMember("config")
.getMember(["from_object", "from_pyfile"])
.getACall()
.getArg(0)
) and
not sink.getScope().getLocation().getFile().inStdlib()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this part could essentially be replaced by something like the following:

from API::Node app, API::CallNode cn, string filenameUsed
where
  app = API::moduleImport("flask").getMember("Flask").getASubclass*().getACall().getReturn() and
  // your sanitizer for Session
  cn = app.getMember("config")
  .getMember(["from_object", "from_pyfile"])
  .getACall() and
  filenameUsed = cn
            .getParameter(0)
            .getAValueReachingSink()
            .asExpr().(StrConst).getText()
select cn, filenameUsed

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 29, 2023

Hi @RasmusWL I opened another pull request because of my mistake :(

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 29, 2023

Also I'm commenting here that I moved this pull request to here

@yoff
Copy link
Contributor

yoff commented Jul 26, 2023

For me to accept this, I need you to look into having consistent results for all the uses of os.environ and os.getenv in the config.py file. -- Currently some are found by the query, and some are not.

Is this fixed now?

@yoff
Copy link
Contributor

yoff commented Jul 26, 2023

Everyone will be on vacation until end of next week, but it seems you are close :-)

@am0o0
Copy link
Contributor Author

am0o0 commented Jul 26, 2023

Is this fixed now?

Yes

RasmusWL
RasmusWL previously approved these changes Aug 14, 2023
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge now, will have to look things over a bit when we promote this query, especially the sanitizer around flask_session.

@RasmusWL RasmusWL added the no-change-note-required This PR does not need a change note label Aug 14, 2023
RasmusWL
RasmusWL previously approved these changes Aug 14, 2023
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments on previous approving review

@am0o0
Copy link
Contributor Author

am0o0 commented Aug 14, 2023

see comments on previous approving review

@RasmusWL I did variant analysis and it seems there are many false positives because of the test projects, can we do a sanitize that the path don't contain any test/example keyword?

@RasmusWL
Copy link
Member

@RasmusWL I did variant analysis and it seems there are many false positives because of the test projects, can we do a sanitize that the path don't contain any test/example keyword?

@amammad sure, you can use the following piece of if you want to

not origin.getScope().getScope*() instanceof TestScope

@am0o0
Copy link
Contributor Author

am0o0 commented Aug 14, 2023

@amammad sure, you can use the following piece of if you want to

not origin.getScope().getScope*() instanceof TestScope

@RasmusWL thanks I tried this just now, but it seems it is not effective, I've tested following and it seems there are much fewer false positives related to example/test/demo code examples.

  predicate isBarrier(DataFlow::Node node) {
    node.getLocation().getFile().inStdlib() or
    node.getLocation().getFile().getAbsolutePath().matches(["%test%", "%demo%", "%example%"])
  }

@RasmusWL
Copy link
Member

@amammad it's your choice. In the future though, I would appreciate if you made such changes before making the PR 😊

@am0o0
Copy link
Contributor Author

am0o0 commented Aug 14, 2023

@amammad it's your choice. In the future though, I would appreciate if you made such changes before making the PR 😊

@RasmusWL I do apologize about my noob behaviors :)

@RasmusWL
Copy link
Member

Had to retrigger CI, so just merged in main 😊

@RasmusWL RasmusWL self-assigned this Aug 16, 2023
@am0o0
Copy link
Contributor Author

am0o0 commented Aug 17, 2023

@RasmusWL I'm so sorry I should have checked the tests. :(

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future me: see comments on previous approving review

@RasmusWL RasmusWL merged commit c8c69aa into github:main Aug 21, 2023
10 checks passed
@am0o0 am0o0 deleted the amammad-python-WebAppsConstatntSecretKeys branch September 14, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants