Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OmegaConf is vulenrable to ddos via specially crafted yaml input (billion laughs attack) #794

Open
omry opened this issue Oct 5, 2021 · 4 comments
Assignees
Labels
bug Something isn't working priority_medium

Comments

@omry
Copy link
Owner

omry commented Oct 5, 2021

OmegaConf is vulnerable to Billion laughs attack.
Specially crafted yaml input files can use yaml anchors to create a yaml input that is fast to load but very slow to traverse because it contains billions of nodes.

This is considered medium priority and will be addressed in a future major version of OmegaConf.
Most potential fixes are breaking.

Details

pyyaml is creating references to the same object when it loads an anchor, which is why it can do load such files quickly.

Example input:

lol1: &lol1 "lol"
lol2: &lol2 [*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1]
lol3: &lol3 [*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2]
lol4: &lol4 [*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3]
lol5: &lol5 [*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4]
lol6: &lol6 [*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5]
lol7: &lol7 [*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6]
lol8: &lol8 [*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7]
lol9: &lol9 [*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8]
lol10: &lol10 [*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9]

Such input creates billions of yaml nodes. OmegaConf will take an unreasonably long time to instantiate an object from it.

User-land solutions

1. Use interpolations instead of yaml anchors

Avoid yaml anchors and use interpolations instead.

The following config uses interpolations to achieve a similar behavior:

lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]
lol3: ["${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}"]
lol4: ["${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}"]
lol5: ["${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}"]
lol6: ["${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}"]
lol7: ["${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}"]
lol8: ["${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}"]
lol9: ["${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}"]
lol10: ["${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}"]

Note that the config is still too big for actual traversal but since it's evaluated lazily it's possible to use it if it's done carefully.

2. Validate object created by pyyaml

If the config cannot be trusted, a check can be made before instantiating OmegaConf from a dict (instead of from a file) with a function similar to this:

def check_no_refs(obj, memo=None):
    if memo is None:
        memo = set()
    if isinstance(obj, dict):
        c = 0
        for k, v in obj.items():
            c = c + check_no_refs(v, memo)
        return c
    if isinstance(obj, list):
        c = 0
        for v in obj:
            c = c + check_no_refs(v, memo)
        return c
    else:
        idv = id(obj)
        if idv in memo:
            raise RuntimeError("duplicate item added")
        memo.add(idv)
        return 1


with io.open(os.path.abspath("1.yaml"), "r", encoding="utf-8") as f:
    obj = yaml.load(f)

check_no_refs(obj)
# Create cfg after checking dict returned by pyyaml
cfg = OmegaConf.create(obj)

Possible future OmegaConf solutions

The following are possible strategies to handle it inside OmegaConf.

  1. Deprecate yaml anchors in favor of interpolations (detect usage of anchors and issue a warning telling the user to use interpolations instead
  2. Automatically convert detected anchors to interpolation, this is potentially tricky
  3. Limit config loaded to yaml files/strings to a maximum number of nodes and issue an error (10k or even 50k nodes seems like plenty).
  4. Provide an API allowing safe usage by rejecting all anchors. default behavior is TBD.
@omry omry added the bug Something isn't working label Oct 5, 2021
@omry omry added this to the OmegaConf 2.2 milestone Oct 5, 2021
@pixelb pixelb self-assigned this Apr 7, 2022
@pixelb pixelb modified the milestones: OmegaConf 2.2, OmegaConf 2.3 May 18, 2022
@Jasha10 Jasha10 self-assigned this Dec 15, 2022
@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 16, 2022

Possible future OmegaConf solutions

The following are possible strategies to handle it inside OmegaConf.

  1. Deprecate yaml anchors in favor of interpolations (detect usage of anchors and issue a warning telling the user to use interpolations instead
  2. Automatically convert detected anchors to interpolation, this is potentially tricky
  3. Limit config loaded to yaml files/strings to a maximum number of nodes and issue an error (10k or even 50k nodes seems like plenty).
  4. Provide an API allowing safe usage by rejecting all anchors. default behavior is TBD.

I like option 3; it feels safer and easier to implement than 1 and 2.

R.e. option 1:

  • As far as I can tell, pyyaml doesn't have a setting to disable anchors (see e.g. How to disable anchors and aliases? yaml/pyyaml#37), so we'd have to roll our own anchor detection logic.
  • I can recreate the attack with pure python (i.e. this is not just a yaml problem):
# lol_attack.py
lol1 = "lol"
lol2 = [lol1, lol1, lol1, lol1, lol1, lol1, lol1, lol1, lol1]
lol3 = [lol2, lol2, lol2, lol2, lol2, lol2, lol2, lol2, lol2]
lol4 = [lol3, lol3, lol3, lol3, lol3, lol3, lol3, lol3, lol3]
lol5 = [lol4, lol4, lol4, lol4, lol4, lol4, lol4, lol4, lol4]
lol6 = [lol5, lol5, lol5, lol5, lol5, lol5, lol5, lol5, lol5]
lol7 = [lol6, lol6, lol6, lol6, lol6, lol6, lol6, lol6, lol6]
lol8 = [lol7, lol7, lol7, lol7, lol7, lol7, lol7, lol7, lol7]
lol9 = [lol8, lol8, lol8, lol8, lol8, lol8, lol8, lol8, lol8]
lol10 = [lol9, lol9, lol9, lol9, lol9, lol9, lol9, lol9, lol9]

# The below lines will fill up computer memory:
# from omegaconf import OmegaConf
# cfg = OmegaConf.create(lol10)

R.e. option 2: Still vulnerable if the user calls OmegaConf.resolve(cfg). Also, as @omry said above, possibly tricky to implement.

@omry
Copy link
Owner Author

omry commented Dec 16, 2022

The problem with having a hard limit on number of nodes is that it could break for legitimate use cases suddenly based on user input.
Imagine a system in production that suddenly blows up because someone fed a slightly larger config.
Config composition can also create very large config via small ones, making this even more likely.
I am not worried about users calling resolve(). It usage is much more rare than loading yaml files.

4 is also interesting. Maybe have a static function for setting anchors policy and have it reject them by default. (detected by seeing the same object in the dictionary).
Low level can raise a DuplicateNodeError and the function loading yaml can link to to github issue (or a comment on it) explaining the issue and what the options are.

something like:
Duplicate node detected, see github.com/.../ for more information

Where the page would indicate that anchors are inferior because they don't work cross files and are vulnerable to this attack and suggest converting to interpolations, or, as an alternative - disable the check vis something like OmegaConf.enable_unsafe_anchors(true).

@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 17, 2022

4 is also interesting. Maybe have a static function for setting anchors policy and have it reject them by default.

So the idea is to introduce some global state that keeps track of the current policy, right? Or perhaps the user could set a flag throw_on_duplicate_nodes=True in their call to OmegaConf.{create,load,structured}?

@omry
Copy link
Owner Author

omry commented Jan 5, 2023

Yes.
If we are introducing the ability to specify policy as a file header, we also need the ability to completely disable the functionality for people who cannot trust their input configs.

class AnchorsPolicy(Enum):
  # Reject anchors, allow for header to change behavior. Default behavior
  HEADER  = 1

  # Always allow anchors
  UNSAFE  = 2

 # Never allow anchors. Reject anchors even in file header attempts to enable it.
  RESTRICTED = 3

OmegaConf.setAnchorsPolicy(policy: AnchorsPolicy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority_medium
Projects
None yet
Development

No branches or pull requests

3 participants