-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do we need defusedxml? #12256
Comments
Some EEG systems store data in xml format. Since we sometimes have users share that data with us and load it locally for debugging, this protects our local machines from malicious xml. |
The docs specifically state (emphasis mine): "Use of this package is recommended for any server code that parses untrusted XML data." The only two vulnerabilities that exist in the standard library are DoS attacks (see this table). If someone parses malicious XML, the only harm done is 100% CPU load (and maybe filling up space). This is dangerous when running a server, but not in the case of MNE. Therefore, I'd rather not depend on defusedxml. |
What is the downside, exactly? Personally I like being protected from 100% CPU load / having my disks get filled up with unwanted files; I don't see how this is not a problem just because I'm working on a desktop or laptop machine rather than a server. MNE parses untrusted XML. |
Given our strict policy for adding new requirements, I just find it surprising that this was added without any discussion. I'm not against dependencies, but MNE-Python has always been very conservative (see e.g. #11564), and if we now want to relax this a bit, we should discuss this with a larger group of devs. If we stick with being conservative, I'm 👎 on adding defusedxml, because it isn't really necessary. The potential harm is manageable (just quit the process), and XML parsing is a tiny niche part in our code (basically, the EGI reader). Re server vs. desktop, a DoS attack on a server results in the server being unavailable until an admin steps in, which can take a while. This affects all clients which want to connect to the server. On a desktop, the user can immediately handle the problem, and no one else is affected. |
I am with @cbrnr on this one. |
Reconciliation attempt:
|
Apparently my suggestion can be misunderstood. So let me clarify: I'm suggesting to keep supporting Egi files through I'm further suggesting to drop the Any other places that currently use |
FYI this read may also interest you: Things seem to be quite messy in the Python XML world. |
I read https://github.com/tiran/defusedxml/tree/main#python-xml-libraries again, and to me, it looks like the standard library XML parsers are protected against billion laughs and quadratic blowup, and they don't expand entities anymore, at least for Python ≥ 3.8. So I think we should be safe using standard library parsers (we are using |
Oh that's great news, then! |
This is a legitimate complaint. However, it was added in #11937,
I don't find that convincing:
I think you're mis-reading the table. Both
Yes, I've read that too @hoechenberger. To me the take-home message was still "stdlib xml module is still not considered safe out of the box, defusedxml is safe by default and still maintained, and lxml is more actively developed and is widely used, but still has some unsafe defaults". In other words, it did not convince me that there was a better alternative to |
I did not say that you snuck it in, but whenever a PR adds a new core dependency, I think it would be helpful if this was very clearly mentioned. Even though I commented, I didn't realize that this PR added a new package. So yes, some kind of more formal approval would be nice.
"Maybe" is qualified with footnotes, and in summary, I read that to mean that stdlib XML parsers are not vulnerable with
I think it the version can be obtained with: from xml.parsers import expat
expat.version_info I don't know how to show expat versions for all Python versions (on different platforms), I'd hope that this was documented somewhere, but I didn't find anything yet.
This is probably the key point where we disagree. If you really want to safeguard against these attacks, it seems like we are stuck with |
I'm not sure how likely we / end users are to hit problems, but I'd rather be safe than sorry here. My vote is to keep |
Yes, I'm happy with this solution! Thanks @larsoner! |
OK with me to make it an optional dep. To (hopefully) clarify about our different readings of the vulnerability table @cbrnr: my understanding is that There are some assumptions in there (e.g., I'm assuming that |
I was wondering why we decided to use defusedxml as a required dependency for parsing XML instead of the standard library. The docs state that defusedxml should be used for server code, which definitely does not apply to MNE. Since we have a very strict policy for adding required dependencies, I'm a bit surprised that this was added without much discussion.
The text was updated successfully, but these errors were encountered: