Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

param: fix exponential memory allocation #10178

Merged
merged 2 commits into from
Dec 16, 2023
Merged

param: fix exponential memory allocation #10178

merged 2 commits into from
Dec 16, 2023

Conversation

0x2b3bfa0
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 commented Dec 16, 2023

Closes #10177.

Explanation

  • By default, dpath.merge appends new elements to existing lists.
  • Parsing YAML files with reused anchors will pass mutable objects like lists by reference.
  • Changes performed to e.g. lists inside config will also be applied to e.g. lists inside ret sharing the same anchor.

Before

from dpath import merge


ret = {}
config = {"list": [1, 2]}

for _ in range(4):
    merge(ret, config)
    print(ret)
{'list': [1, 2]}
{'list': [1, 2, 1, 2]}
{'list': [1, 2, 1, 2, 1, 2, 1, 2]}
{'list': [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]}

After

from copy import deepcopy

from dpath import merge


ret = {}
config = {"list": [1, 2]}

for _ in range(4):
    merge(ret, deepcopy(config))
    print(ret)
{'list': [1, 2]}
{'list': [1, 2, 1, 2]}
{'list': [1, 2, 1, 2, 1, 2]}
{'list': [1, 2, 1, 2, 1, 2, 1, 2]}

@0x2b3bfa0 0x2b3bfa0 added bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. A: params Related to dvc params labels Dec 16, 2023
@0x2b3bfa0 0x2b3bfa0 requested a review from a team December 16, 2023 07:07
@0x2b3bfa0 0x2b3bfa0 self-assigned this Dec 16, 2023
@skshetry
Copy link
Member

@0x2b3bfa0, is there a way without using deepcopy?

@skshetry
Copy link
Member

skshetry commented Dec 16, 2023

We need to check for similar problems in metrics too.

We don't support list in metrics.

Copy link

codecov bot commented Dec 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3e4e143) 90.61% compared to head (922dfe3) 90.63%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10178      +/-   ##
==========================================
+ Coverage   90.61%   90.63%   +0.01%     
==========================================
  Files         499      499              
  Lines       37880    37881       +1     
  Branches     5505     5505              
==========================================
+ Hits        34326    34332       +6     
+ Misses       2912     2909       -3     
+ Partials      642      640       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Dec 16, 2023

@0x2b3bfa0, is there a way without using deepcopy?

We can try dpath.merge(flags=2) (update set of unique values) or dpath.merge(flags=0) (replace existing), but that would change slightly the merging behavior.

I was about to suggest another alternative based on PyYAML serialization/deserialization, but it seems like anchors can't be disabled: yaml/pyyaml#103.

@skshetry
Copy link
Member

skshetry commented Dec 16, 2023

Does making keypaths unique solve this issue inside read_param_file? Or, can this still be exploited?

from funcy import ldistinct

key_paths = ldistinct(key_paths)

@skshetry
Copy link
Member

skshetry commented Dec 16, 2023

Does making keypaths unique solve this issue ...

Looks like not. As all different keypaths point to same reference.

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Dec 16, 2023

Does making keypaths unique solve this issue ...

Yes, it seems so! I need to get some sleep, but you may find the minimal examples at #10177 useful for debugging.

@skshetry
Copy link
Member

Although this would fix the issue for the particular snippet, dvc is still susceptible to "billion laughs" attack. See yaml/pyyaml#235 and kislyuk/yq#134.

@skshetry skshetry merged commit abcd70b into main Dec 16, 2023
24 checks passed
@skshetry skshetry deleted the param-memory-fix branch December 16, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: params Related to dvc params bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential memory allocation caused by YAML parameter files with reused anchors
2 participants