-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rephrase ast.literal_eval() to remove any security warranty #95588
Comments
Comparing Also, I've seen lots of folks use If you're really worried about this, maybe it should get a |
We could remove the safety warranties from our documentation, but we cannot drop the security promise for Python 3.11 and earlier. The API was documented as safe when Python 3.10 and earlier came out. We cannot just void this promise mid-air. |
I made a similar suggestion in our internal chat. We may have to restrict both input string length and amount of AST nodes. |
I am generally opposed to that kind of thing because is very difficult to predict correctly what the effect will be. For example, the parser creates AST nodes on the go as it parses incorrect constructs and those will not be ultimately used unless the end in a cache. What is worse, changing the grammar will have very visible effects so something that didn't crash before now will reach the limit. Given how tricky is to maintain the parser in general I advice strongly against more complexity than we already have, which is a lot. Every special mode of the parser can be a pain to maintain when extending error messages or doing more grammar rules |
The documentation is wrong, misleading and should be fixed. A function which is known to crash must not be used with an untrusted string. IMO making literal_eval() safer is a new and separated feature request. If you have a concrete idea how to make literal_eval() safer, please open a separated issue. Also, fixing a bunch of parser issues is always welcomed, but it's also a separated issue. |
See issue #83340 for example. |
The documentation is a promise to our users. We must do our best to solve the problem -- at least for common and simple cases. Just dropping the security promise from the documentation is blame shifting. It does not help our user base who is already using the feature. |
Writing a more minimal expression parser as Raymond suggests is likely the best way to make |
Is there no way to figure out an acceptable solution complexity wise? Or just to build more confidence in what we have? Maybe by adding a fuzzer test for it (there seems to be a test for json here, which is probably quite similar in many ways). |
We do fuzz test ast.literal_eval via Google's oss-fuzz. The entry point is https://github.com/python/cpython/blob/main/Modules/_xxtestfuzz/fuzzer.c#L396. It has revealed numerous compiler and parser issues that have been fixed. But is often stuck re-triggering known non-trivial issues such as stack overflows and computational timeouts at this point. "The" problem with writing a minimal expression parser to replace literal_eval's implementation is the complexity. Ex:
We can and must. Because we'll never fix literal_eval in anything acceptable to backport to a release branch. Doing that requires an entire new non-recursive not Python compiler based implementation. The bug we can fix is in our documentation which claims it is safe to crash the process by parsing untrusted data. It isn't. |
I doubt many will need the full power here, but things like dicts containing a list maybe, in fact a dict with a list containing tuples is the exact, untrusted, use-case, I have. If you update the docs, maybe mention |
So the problem with Concretely: suppose you have an IDE where whenever you open a |
Likely?
Under a more common default stack size or >= 2048KiB our internal checks for too much nesting being > 200 fire and lead to a We cannot control the C stack size - and introspecting it is not always plausible - so we cannot make guarantees so long as we're C stack recursive. #91079 is related to helping out here. Fundamentally this is a problem with using recursion in a language that doesn't have a dynamic stack (read: C and C++). Input based recursion is not wise for parsing untrusted inputs in such an environment. |
How infeasible is it on the most common platforms (Linux, Windows, macOS, various BSD) to introspect the stack size? We're at least partially in this mess because back in 1990 I didn't know how to do that using portable C code, but hopefully things have improved since then, and there is at least a non-portable way? At least for the parser a single pointer compare shouldn't break the bank. |
At least in Linux we can ask the OS for the stack limit and then we could compare addresses as they cannot move. I can try to implement a prototype of this. |
This seems like more of a semantics issue than anything else. I for one see no contradiction in the original documentation: the kind of "safety" being offered is "not executing arbitrary code", not "never crashing". This is sufficient for untrusted input in many situations. Making sure a function never crashes, even with pathological input, is a tall order. I can see the value in being able to make this additional guarantee, but in the meantime, the original issue can be resolved simply by wording the existing guarantee more explicitly. |
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
…ythonGH-95919) It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter. (cherry picked from commit 8baef8a) Co-authored-by: Gregory P. Smith <[email protected]>
…ythonGH-95919) It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter. (cherry picked from commit 8baef8a) Co-authored-by: Gregory P. Smith <[email protected]>
More explicit wording added. |
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter. (cherry picked from commit 8baef8a) Co-authored-by: Gregory P. Smith <[email protected]>
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter. (cherry picked from commit 8baef8a) Co-authored-by: Gregory P. Smith <[email protected]>
…ython#95919) It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
Fixed by PR #95919. Thank you! |
* main: (2069 commits) pythongh-96512: Move int_max_str_digits setting to PyConfig (python#96944) pythongh-94808: Coverage: Check picklablability of calliter (python#95923) pythongh-94808: Add test coverage for PyObject_HasAttrString (python#96627) pythongh-94732: Fix KeyboardInterrupt race in asyncio run_forever() (python#97765) Fix typos in `bltinmodule.c`. (pythonGH-97766) pythongh-94808: `_PyLineTable_StartsLine` was not used (pythonGH-96609) pythongh-97681: Remove Tools/demo/ directory (python#97682) Fix typo in unittest docs (python#97742) pythongh-97728: Argument Clinic: Fix uninitialized variable in the Py_UNICODE converter (pythonGH-97729) pythongh-95913: Fix PEP number in PEP 678 What's New ref label (python#97739) pythongh-95913: Copyedit/improve New Modules What's New section (python#97721) pythongh-97740: Fix bang in Sphinx C domain ref target syntax (python#97741) pythongh-96819: multiprocessing.resource_tracker: check if length of pipe write <= 512 (python#96890) pythongh-97706: multiprocessing tests: Delete unused variable `rand` (python#97707) pythonGH-85447: Clarify docs about awaiting future multiple times (python#97738) [docs] Update logging cookbook with recipe for using a logger like an output… (pythonGH-97730) pythongh-97607: Fix content parsing in the impl-detail reST directive (python#97652) pythongh-95975: Move except/*/finally ref labels to more precise locations (python#95976) pythongh-97591: In `Exception.__setstate__()` acquire strong references before calling `tp_hash` slot (python#97700) pythongh-95588: Drop the safety claim from `ast.literal_eval` docs. (python#95919) ...
Are there any hints on a workaround(s)? The only thing I can think of right now is just rejecting "large" inputs (not sure how large). Which is still a bit inconvenient in my use-case (which at some time explicitly wanted to allow them). Rejecting deep nesting, or very many nodes would seem a bit clearer. |
In my experience, the most secure way to evaluate/run untrusted code is to spawn a separated process and runs this process in a sandbox where you can limit time, memory, syscalls, etc. |
Yea, but whether or not it helps me, it seem like there should be some practical solution for very simple stuff here.
I would be surprised if I am the only one in a position where historically |
It was never safe. The only change is the documentation that has been fixed. One way to reduce the risk of crash is to reject strings longer than a limit (ex: 100 characters). This issue is closed. I suggest you opening a discussion elsewhere. There is no plan to implement a sandbox in CPython: https://lwn.net/Articles/574215/ |
…ythonGH-95919) It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter. (cherry picked from commit 8baef8a) Co-authored-by: Gregory P. Smith <[email protected]>
…H-95919) (GH-126729) It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter. (cherry picked from commit 8baef8a) Co-authored-by: Gregory P. Smith <[email protected]>
Currently, ast.literal_eval() documentation gives multiple security warranties:
IMO that's plain wrong if you read the following RED WARNING:
The documentation should be rephrased to only described the purpose of the function and make it very clear that it must NOT be used on untrusted sources.
We can follow the phrasing of the pickle documentation: https://docs.python.org/dev/library/pickle.html
Linked PRs
ast.literal_eval
docs. (GH-95919) #126729The text was updated successfully, but these errors were encountered: