-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested Scripts #127
Comments
Thank you for reporting this. I'm looking at it now (apologies for the delay). |
I'm turning these into executable tests, and although I can easily reproduce the first example (which is a security problem), I cannot reproduce what you're seeing with the second or third. Here's my code. Can you tell me what you're doing differently? #!/usr/bin/env ruby
require 'loofah'
require 'yaml'
Nokogiri::VERSION_INFO.to_yaml
# => "---\n" +
# "warnings: []\n" +
# "nokogiri: 1.8.1\n" +
# "ruby:\n" +
# " version: 2.4.1\n" +
# " platform: x86_64-linux\n" +
# " description: ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]\n" +
# " engine: ruby\n" +
# "libxml:\n" +
# " binding: extension\n" +
# " source: packaged\n" +
# " libxml2_path: \"/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/nokogiri-1.8.1/ports/x86_64-pc-linux-gnu/libxml2/2.9.5\"\n" +
# " libxslt_path: \"/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/nokogiri-1.8.1/ports/x86_64-pc-linux-gnu/libxslt/1.1.30\"\n" +
# " libxml2_patches: []\n" +
# " libxslt_patches: []\n" +
# " compiled: 2.9.5\n" +
# " loaded: 2.9.5\n"
html = "<script><script src='malicious.js'></script>"
Loofah.fragment(html).scrub!(:strip).to_html # => "<script src='malicious.js'>"
Loofah.fragment(html).scrub!(:strip).to_text # => "<script src='malicious.js'>"
html = "<<s>script src='malicious.js'>"
Loofah.fragment(html).scrub!(:strip).to_html # => "<<s>script src='malicious.js'></s>"
Loofah.fragment(html).scrub!(:strip).to_text # => "<script src='malicious.js'>"
Loofah.fragment(html).scrub!(:prune).to_html # => "<<s>script src='malicious.js'></s>"
Loofah.fragment(html).scrub!(:prune).to_text # => "<script src='malicious.js'>"
html = "<<s>script>alert('a')<<s>/script>"
Loofah.fragment(html).scrub!(:strip).to_html # => "<<s>script>alert('a')<<s>/script></s></s>"
Loofah.fragment(html).scrub!(:strip).to_text # => "<script>alert('a')</script>"
Loofah.fragment(html).scrub!(:prune).to_html # => "<<s>script>alert('a')<<s>/script></s></s>"
Loofah.fragment(html).scrub!(:prune).to_text # => "<script>alert('a')</script>" |
My proposed fix for example 1 is in #132, here it is inline: diff --git a/lib/loofah/scrubbers.rb b/lib/loofah/scrubbers.rb
index 508f6bf..982c593 100644
--- a/lib/loofah/scrubbers.rb
+++ b/lib/loofah/scrubbers.rb
@@ -99,7 +99,12 @@ module Loofah
def scrub(node)
return CONTINUE if html5lib_sanitize(node) == CONTINUE
- node.before node.children
+ if node.children.length == 1 && node.children.first.cdata?
+ sanitized_text = Loofah.fragment(node.children.first.to_html).scrub!(:strip).to_html
+ node.before Nokogiri::XML::Text.new(sanitized_text, node.document)
+ else
+ node.before node.children
+ end
node.remove
end
end
diff --git a/test/integration/test_ad_hoc.rb b/test/integration/test_ad_hoc.rb
index be4583f..1353966 100644
--- a/test/integration/test_ad_hoc.rb
+++ b/test/integration/test_ad_hoc.rb
@@ -157,6 +157,20 @@ mso-bidi-language:#0400;}
assert_equal "", Loofah.scrub_document('<script>test</script>', :prune).text
end
+ def test_nested_script_cdata_tags_should_be_scrubbed
+ html = "<script><script src='malicious.js'></script>"
+ stripped = Loofah.fragment(html).scrub!(:strip)
+ assert_empty stripped.xpath("//script")
+ refute_match("<script", stripped.to_html)
+ end
+
+ def test_nested_script_cdata_tags_should_be_scrubbed_2
+ html = "<script><script>alert('a');</script></script>"
+ stripped = Loofah.fragment(html).scrub!(:strip)
+ assert_empty stripped.xpath("//script")
+ refute_match("<script", stripped.to_html)
+ end
+
def test_removal_of_all_tags
html = <<-HTML
What's up <strong>doc</strong>? What do you think of that? And can you please help me understand if there's a security vulnerablity in either of the second or third example you've given? |
Sorry for the delay on replying. The issue I am reporting is when calling the .text method with encode_special_chars: false Without fixing this behaviour for the cases in which encode_special_chars is false, the library ends up having only one layer of security (encoding characters) rendering the scrub effectively useless in this cases. This is particularly troubling as loofah_activerecord uses the |
We are seeing this issue as well. The inability of Loofah to handle proper escaping of nested tags is a problem that didn't exist with the deprecated Rails sanitizer and is not addressed by this proposed solution. To reiterate what OP is pointing out in his second and third examples: Input: Ouput in Rails deprecated sanitizer: Output using Loofah strip: |
@kaspth or @rafaelfranca can either of you help me understand? |
Is this related to rails/rails-html-sanitizer#48 ? |
Or rails/rails#28060 ? |
Wait, are you both saying that when you ask Loofah to not-escape entities, and you get them back unescaped, that you think it's a bug? I'm really struggling here to understand. Hopefully someone can help me, in particular with working code. |
OK, I think this is what y'all are saying:
Can I get some help understanding what you want the output to be? What's the failing test (meaning, what's the desired output)? |
That is, what is Nokogiri/Loofah supposed to do when you ask for entities to be delivered unencoded? |
Hi , sorry for the late response. The problem is not that the encode_special_chars: false is returning the strip unencoded, the problem is that the SCRUBBER is not deleting this: The ideal outcome is that, if the input is: Hence this is my expected input / output: If my input is: If my input is: However, if my input is: (I could live without this last one, but this is what I would expect) The issue (in our case) is that we ARE encoding further down the line, hence we do NOT need Loofah to encode for us (hence encode_special_chars: false), but we DO need Loofah to scrub the input as was given, instead of encoding it first, scrubbing it, and then providing an output that, if passed again through Loofah would actually be cleaned. (We are using loofah-activerecord, but then encoding on the views) |
Thanks for explaining a bit more. I'm going to preface my remarks by saying that I no longer do Rails every day, and so my understanding of current usage of Loofah and Rails's view rendering code is pretty weak. I'm hesitant to make a change to account for examples 2 and 3 in the OP, but I'm open to it, particularly if Rails Core people tell me it's what Loofah should do. I'm CCing @kaspth and @rafaelfranca who have some of the original context, and with whom I worked when Loofah was introduced as Rails's default sanitizer. Maybe one or both of them can help us sort out the right thing to do? The big question I still have is: If your app is encoding these strings later in the rendering pipeline, can you help me understand why this behavior is a vulnerability? By default (by encoding special characters), Loofah is doing the right thing; Examples 2 and 3 are essentially asking Loofah to recursively (maybe only once? maybe more?) scrub <div>Javascript Lesson One: <tt><script>alert('a')</script></tt></div>" which after one round of scrubbing will output the following not-markup, a.k.a. text (again, only when
and after two rounds:
Part of the reason that Rails Core upgraded to Loofah was to handle use cases such as this, where (properly escaped) HTML was being quoted in the document. I admit that Loofah's behavior is different from previous Rails sanitizer behavior, particularly in ways related to entities, which are discussed in the two rails issues I linked to above. The big difference is the introduction of "rendering context" -- whether you're asking Loofah (and Nokogiri) to emit html or text. If you ask for HTML you'll never get text nodes that look like HTML (as far as I know now, other than the example 1 mentioned in this ticket for which there exists a fix). If you do ask for text, and opt into special characters not being encoded, then you might get text that looks like markup; but you've accepted this risk by asking Loofah to step outside its normal behavior and deliver text with |
Note that I've merged #132 which addresses example 1. |
This is a vulnerability for us as we are providing an open API, this API will respond with the text unencoded (as most people will do encoding with JS platforms such as Angular, React, or Rails' ERB), however, by providing responses that may have XSS injections we are leaving a potential vulnerability for API consumers that don't sanitize appropiately. I do understand that asking for text and opting into special characters SHOULD open me to some vulnerabilities. It may be that the rest of this discussion needs to go on the However, I do feel like it kind of defeats the purpose of scrubbing at all, if an attacker can achieve a result that looks exactly the same as the one that can be achieved without scrubbing. |
@myxoh I've been staring at this thread and, I apologize, I still do not understand why you think asking Loofah to return strings with unencoded entities, then getting strings with unencoded entites, is a vulnerability. By default, either HTML or text is going to contain HTML entities in the case you're bringing up. Again, I'll write the code: #!/usr/bin/env ruby
require "loofah"
html_fragment = %q{<div><script>alert('a')</script></div>}
Loofah.fragment(html_fragment).scrub!(:strip).to_html
# => "<div><script>alert('a')</script></div>"
Loofah.fragment(html_fragment).scrub!(:strip).to_text
# => "\n" + "<script>alert('a')</script>\n"
Loofah.fragment(html_fragment).scrub!(:strip).to_text(encode_special_chars: false)
# => "\n" + "<script>alert('a')</script>\n" Only by opting in to not encoding entities do you see this output. My suggestion would be, please don't opt into the behavior that returns unencoded entities; or else use another sanitization library that provides you with greater control over the output. I'd like to close this, as I think we've reached a point where we disagree on the desired behavior. It's possible that I'm still misunderstanding, and if that's the case, I really really do want to understand; but then I also must insist that you provide code and/or failing tests that will illuminate me. I hope you agree that's a fair request. |
@flavorjones sorry for letting this sit so long. It seems I'm too late to help though. I agree with you that closing seems fair based on reading your final comment 😊 |
Currently we are susceptible to a (slight) vulnerability with nested scripts:
Examples (using
scrub_fragment
):Input:
<script><script src='malicious.js'></script>
Sanitizer:
strip
Output (text):
<script src='malicious.js'>
Output (unescaped_text):
<script src='malicious.js'>
(Sanitizer
prune
is immune to this)Input:
<<s>script src='malicious.js'>
Sanitizer:
strip
,prune
Output (text):
<script src='malicious.js'>
Output (raw):
<script src='malicious.js'>
Last example using
strip
orprune
:Input:
<<s>script>alert('a')<<s>/script>
Output(raw):
<script>alert('a')</script>
Why is this a problem?
I'm happy to discuss this, but I do believe that we should try to strip recursively as even though this outpus are only dangerous if unescaped, the whole purpose of scrubbing is to try to obtain the safest string back, and while <script src='malicious.js'> is not in itself unsafe, it is certainly less safe than it should be after going through a scrubber.
I've attached a PR
#128
with a potential solution using recursive scrubbing (information regarding this PR implementation is available on it)
The text was updated successfully, but these errors were encountered: