-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Billion laughs attack #235
Comments
My workaround: https://github.com/guyskk/simple-yaml |
I think the loading part of PyYAML itself is not a problem. That said, ignoring aliases can still be an option. I don't see it as critical, though. |
I tested with Python 3.6.6 under WSL, PyYAML 3.13 with the following code:
Output from
at peak memory consumption and loading takes quite a while. |
@jonasw234 interesting. I get different results but I don't know how to get memory consumption at the peak. I don't see any significant runtime and memory differences if I comment out lines f-i, for example. |
I think I see the problem now, I was testing in an interactive console, which tried to output the results immediately. My bad, thanks for the help! |
@jonasw234 oh good ;-) |
Hey there, I think there's something isn't fine here. I know the problem isn't from PyYaml itself. but it's from python because of the memory usage when storing strings and integers. but as you can see most of the python codes is using this data and interacting with it and you can guess what's the point of getting the user data when the developer won't use it. Setting a limit for the data getting over there will be great or that means you can freeze most of the python codes using PyYaml now since most of them are storing this data on strings then passing them to functions. or at least you can mention that on the docs. |
Off the top of my head, customizing a If it is indeed as easy as I think it might be to disable anchor support at the Composer level, that might be a customization we could include in the box with some of the new work that's going on... I've added this to the 6.0 project as something to consider and play with (but no promises). EDIT: and TBC, what I'm thinking of here would not address the ask in #37- anchors and refs would still be syntactically significant to the parser, they'd just be errors when actually composing the final representation of the doc. |
That looks great, I guess that will get it from the root case itself. |
For what is worth, this attack is harmless (and perhaps a valid use case) if one were to do lazy anchor resolution. Admittedly this is perhaps a hard thing to do in many languages. Doing it that way, makes YAML suitable for representing any arbitrary hierarchy of objects, including ones with cyclic references. For future versions of the spec I would argue that this is the default behavior. |
Is there a way to disable anchors and aliases or cap the number of characters that can be created through expansions?
Right now PyYAML seems to be susceptible to billion laughs attacks.
@guyskk created a new version in #37 that prevents that but it also uses
OrderedDict
andSafeLoader
, so it might be a good idea to implement just this functionality like theignore_aliases=True
flag in #104 foryaml.load
/yaml.safe_load
.The text was updated successfully, but these errors were encountered: