-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML vulnerabilities in Python #61441
Comments
Experimental fix for XML vulnerabilities against default. It's NOT ready and needs lots of polishing. https://pypi.python.org/pypi/defusedxml contains explanations of all issues |
Since this has dragged on for quite a while, I'm probably just going to release 2.7.4 with a pointer to defusedxml in the release notes. (docs, though, perhaps) |
+1 |
+1 too. |
Not blocking 2.7.4 as discussed on mailing list. |
I did a rough merge with current “default” (3.5 pre-release) branch so that I can have a closer look at this issue; see xmlbomb_20150518.patch for the result. There are some bits with Argument Clinit that need perfecting:
|
I started looking at the lower Expat-level changes. Here are some thoughts, in the order that I thought them. :) But the end result is to investigate a different approach to disable entities in existing versions of Expat. Currently, it looks like max_entity_indirections = 0 is a special value meaning no limit. I think it would be better to use some other value such as None for this, and then 0 could disable all entity expansion (other than pre-defined entities like & &#xNNNN; etc). What is the benefit of having the indirection limit? I would have thought the entity expansion (character) limit on its own would already be effective at preventing nested expansion attacks like “billion laughs”. Even if the entity expanded to an empty string, all of the intermediate entity references are still included in the character count. I wonder if it would make more sense to have a total character limit instead, which would include the characters from custom entity expansions as already counted by the patch, but also count characters directly from the XML body. Why would you want to avoid 8 million characters from entity expansion, but allow 8 million characters of plain XML (or gzipped XML)? (I am not an XML expert, so I could be missing something obvious here.) Now I have discovered that it seems you can build Python to use an external Expat library, which won’t be affected by Christian’s fix (correct me if I am wrong). I think we should find a different solution that will also work with existing external Expat versions. Maybe setting EntityDeclHandler to raise an error would be good enough: >>> from xml.parsers import expat
>>> bomb = '<!DOCTYPE bomb [\n<!ENTITY a "" >\n<!ENTITY b "' + '&a;' * 1000 + '" >\n<!ENTITY c "' + '&b;' * 1000 + '" >\n]>\n<bomb a="' + '&c;' * 10 + '" />\n'
>>> p = expat.ParserCreate()
>>> p.Parse(bomb, True) # Noticeable delay (DOS) while parsing
1
>>> p = expat.ParserCreate()
>>> def handler(*so_much_argh):
... raise ValueError("Entity handling disabled")
...
>>> p.EntityDeclHandler = handler
>>> p.Parse(bomb, True) # Instant failure (no DOS)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/build/python/src/Python-3.4.3/Modules/pyexpat.c", line 494, in EntityDecl
File "<stdin>", line 2, in handler
ValueError: Entity handling disabled This solution has been suggested and implemented elsewhere:
|
I have opened bpo-24238 with a patch for Element Tree that uses my EntityDeclHandler technique, instead of patching Expat. I would be interested in other people’s thoughts on the approach. |
This issue didn't get much attention in 5 years. The XML documentation starts with a big red warning: The warning is present in 2.7 and 3.4 as well: It seems like XML is getting less popular because of JSON becoming more popular (JSON obviously comes with its own set of security issues). It seems like less core developers care about XML. I suggest to: We just have to accept that core developers have limited availability and that documenting security issues is an acceptable tradeoff. I don't see any value of keeping these 3 issues open. |
+1 from me. |
Ned - I don't think this is necessarily a release blocker, as we've been shipping it for a long time, but it would be nice if we can hold 3.7.1rc1 just long enough to get it in (provided Christian jumps in and says he'll get the last minor concerns on the PRs wrapped up very soon) |
We discussed this last week at the sprint. Christian, it would be great if you could get this merged for 3.7 and possibly 3.6 in the next 24 hours. |
The external entity patch is ready, but the billion laughs fix need more time. I'm working with an upstream developer on a proper fix. |
Any reason to not take the current patch for our vendored copy and give it some exposure at least on platforms that rely on it (maybe just Windows)? I don't see any reason to wait on another group to "release" it when we need to manually apply the update to our own repo anyway. Platforms using system libexpat that hasn't been patched have obviously decided not to patch it themselves :) |
My policy is upstream fix: first, get a change merged upstream. If we start with a downstream patch:
Python is vulnerable for years, it's not like there is an urgency to fix it. |
There's also the view that it'll be easier to justify upstreaming a patch if it's been released and tested in a separate app. We require that all the time for Python patches, so why should we expect other projects to be different? We're totally entitled to only release it for those platforms, because we are responsible for libexpat on those (we could vendor it for all of them? Or switch to platform-supported libraries for macOS and Windows?) Who normally updates the vendored libexpat? I'd rather let them make the call on how far to diverge from upstream, since it'll be up to them to roll the changes forward or revert them in favour of upstream. I doubt different defaults will be an issue, especially since they aren't configurable anyway. |
I made the 3 latest libexpat updates, and each of them was painful :-) My notes on vendored libraries: I wrote a tool to get the version of all vendored libraries, and a script to updated libexpat. |
Modules/expat can be used on all platforms. A downstream patch is only a problem for platforms that compile Python with "./configure --with-system-expat". The security fixes for entity expansion blowup and external entity loading are backwards incompatible fixes. Technically they also violate XML standards. In practice the vast majority of users will never run into the issue, because external entities are scarcely used. The expat parser is a non-validating XML parser, so DTDs aren't useful at all. I'd rather break a handful of users than to keep the majority of users vulnerable. To fix billion laughs and quadratic blowup once and for all, we also have to break backwards compatibility and require expat >= 2.3.0. For now the modules still work with old versions of expat. IMO it's fine. Vendors either have to update their libraries or use our copy of expat. Ultimately it's Benjamin's, Larry's, and Ned's decision. They are release managers. |
On Tue, Sep 18, 2018, at 06:39, STINNER Victor wrote:
Oh? I've updated it twice (4e21100 and 5033aa7), and it didn't seem so bad. I just copied the upstream files in. Did I do it wrong? |
Let me remind what I did... bpo-30694 (expat 2.2.1):
bpo-30947 (expat 2.2.3):
There are different issues:
At least for me, each update was painful. It's also painful to have to make the same change in all supported branches (2.7, 3.4, 3.5, 3.6, 3.7, master). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: