Nested Scripts #127

myxoh · 2017-10-10T17:57:14Z

Currently we are susceptible to a (slight) vulnerability with nested scripts:
Examples (using scrub_fragment):

Input: <script><script src='malicious.js'></script>
Sanitizer: strip
Output (text): <script src='malicious.js'>
Output (unescaped_text): <script src='malicious.js'>
(Sanitizer prune is immune to this)

Input: <<s>script src='malicious.js'>
Sanitizer: strip, prune
Output (text): <script src='malicious.js'>
Output (raw): <script src='malicious.js'>

Last example using strip or prune:
Input: <<s>script>alert('a')<<s>/script>
Output(raw): <script>alert('a')</script>

Why is this a problem?
I'm happy to discuss this, but I do believe that we should try to strip recursively as even though this outpus are only dangerous if unescaped, the whole purpose of scrubbing is to try to obtain the safest string back, and while <script src='malicious.js'> is not in itself unsafe, it is certainly less safe than it should be after going through a scrubber.

I've attached a PR
#128
with a potential solution using recursive scrubbing (information regarding this PR implementation is available on it)

The text was updated successfully, but these errors were encountered:

flavorjones · 2017-10-22T17:22:09Z

Thank you for reporting this. I'm looking at it now (apologies for the delay).

flavorjones · 2017-10-22T17:39:40Z

I'm turning these into executable tests, and although I can easily reproduce the first example (which is a security problem), I cannot reproduce what you're seeing with the second or third.

Here's my code. Can you tell me what you're doing differently?

#!/usr/bin/env ruby

require 'loofah'
require 'yaml'

Nokogiri::VERSION_INFO.to_yaml
# => "---\n" +
#    "warnings: []\n" +
#    "nokogiri: 1.8.1\n" +
#    "ruby:\n" +
#    "  version: 2.4.1\n" +
#    "  platform: x86_64-linux\n" +
#    "  description: ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]\n" +
#    "  engine: ruby\n" +
#    "libxml:\n" +
#    "  binding: extension\n" +
#    "  source: packaged\n" +
#    "  libxml2_path: \"/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/nokogiri-1.8.1/ports/x86_64-pc-linux-gnu/libxml2/2.9.5\"\n" +
#    "  libxslt_path: \"/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/nokogiri-1.8.1/ports/x86_64-pc-linux-gnu/libxslt/1.1.30\"\n" +
#    "  libxml2_patches: []\n" +
#    "  libxslt_patches: []\n" +
#    "  compiled: 2.9.5\n" +
#    "  loaded: 2.9.5\n"

html = "<script><script src='malicious.js'></script>"
Loofah.fragment(html).scrub!(:strip).to_html # => "<script src='malicious.js'>"
Loofah.fragment(html).scrub!(:strip).to_text # => "&lt;script src='malicious.js'&gt;"


html = "<<s>script src='malicious.js'>"
Loofah.fragment(html).scrub!(:strip).to_html # => "&lt;<s>script src='malicious.js'&gt;</s>"
Loofah.fragment(html).scrub!(:strip).to_text # => "&lt;script src='malicious.js'&gt;"
Loofah.fragment(html).scrub!(:prune).to_html # => "&lt;<s>script src='malicious.js'&gt;</s>"
Loofah.fragment(html).scrub!(:prune).to_text # => "&lt;script src='malicious.js'&gt;"

html = "<<s>script>alert('a')<<s>/script>"
Loofah.fragment(html).scrub!(:strip).to_html # => "&lt;<s>script&gt;alert('a')&lt;<s>/script&gt;</s></s>"
Loofah.fragment(html).scrub!(:strip).to_text # => "&lt;script&gt;alert('a')&lt;/script&gt;"
Loofah.fragment(html).scrub!(:prune).to_html # => "&lt;<s>script&gt;alert('a')&lt;<s>/script&gt;</s></s>"
Loofah.fragment(html).scrub!(:prune).to_text # => "&lt;script&gt;alert('a')&lt;/script&gt;"

proposed fix for #127

flavorjones · 2017-10-22T18:48:52Z

My proposed fix for example 1 is in #132, here it is inline:

diff --git a/lib/loofah/scrubbers.rb b/lib/loofah/scrubbers.rb
index 508f6bf..982c593 100644
--- a/lib/loofah/scrubbers.rb
+++ b/lib/loofah/scrubbers.rb
@@ -99,7 +99,12 @@ module Loofah
 
       def scrub(node)
         return CONTINUE if html5lib_sanitize(node) == CONTINUE
-        node.before node.children
+        if node.children.length == 1 && node.children.first.cdata?
+          sanitized_text = Loofah.fragment(node.children.first.to_html).scrub!(:strip).to_html
+          node.before Nokogiri::XML::Text.new(sanitized_text, node.document)
+        else
+          node.before node.children
+        end
         node.remove
       end
     end
diff --git a/test/integration/test_ad_hoc.rb b/test/integration/test_ad_hoc.rb
index be4583f..1353966 100644
--- a/test/integration/test_ad_hoc.rb
+++ b/test/integration/test_ad_hoc.rb
@@ -157,6 +157,20 @@ mso-bidi-language:#0400;}
       assert_equal "", Loofah.scrub_document('<script>test</script>', :prune).text
     end
 
+    def test_nested_script_cdata_tags_should_be_scrubbed
+      html = "<script><script src='malicious.js'></script>"
+      stripped = Loofah.fragment(html).scrub!(:strip)
+      assert_empty stripped.xpath("//script")
+      refute_match("<script", stripped.to_html)
+    end
+
+    def test_nested_script_cdata_tags_should_be_scrubbed_2
+      html = "<script><script>alert('a');</script></script>"
+      stripped = Loofah.fragment(html).scrub!(:strip)
+      assert_empty stripped.xpath("//script")
+      refute_match("<script", stripped.to_html)
+    end
+
     def test_removal_of_all_tags
       html = <<-HTML
       What's up <strong>doc</strong>?

What do you think of that? And can you please help me understand if there's a security vulnerablity in either of the second or third example you've given?

myxoh · 2017-10-23T10:31:56Z

Sorry for the delay on replying. The issue I am reporting is when calling the .text method with encode_special_chars: false
While I am aware this is not meant to be 100% safe, I would expect that after I strip a string the result won't contain a script (even though it might contain partial unsafe texts).

Without fixing this behaviour for the cases in which encode_special_chars is false, the library ends up having only one layer of security (encoding characters) rendering the scrub effectively useless in this cases. This is particularly troubling as loofah_activerecord uses the text method, meaning that if I want to allow 'safe' tags (like breaklines) I am left with the nested-script vulnerability.

aaronchi · 2017-11-07T09:31:27Z

We are seeing this issue as well. The inability of Loofah to handle proper escaping of nested tags is a problem that didn't exist with the deprecated Rails sanitizer and is not addressed by this proposed solution.

To reiterate what OP is pointing out in his second and third examples:

Input: <<test>script>alert('hi')<<test>/script>

Ouput in Rails deprecated sanitizer:
alert('hi')

Output using Loofah strip:
<script>alert('hi')</script>
with encode_special_chars false:
<script>alert('hi')</script>

flavorjones · 2017-11-12T21:31:48Z

@myxoh I've written code above twice to try to understand what you're seeing. Can you please write code back to help me understand?

@aaronchi Can you please write code as well to help me understand?

flavorjones · 2017-11-12T21:37:16Z

@kaspth or @rafaelfranca can either of you help me understand?

flavorjones · 2017-11-12T21:37:36Z

Is this related to rails/rails-html-sanitizer#48 ?

flavorjones · 2017-11-12T21:38:34Z

Or rails/rails#28060 ?

flavorjones · 2017-11-12T21:40:39Z

Wait, are you both saying that when you ask Loofah to not-escape entities, and you get them back unescaped, that you think it's a bug? I'm really struggling here to understand. Hopefully someone can help me, in particular with working code.

flavorjones · 2017-11-12T21:47:10Z

OK, I think this is what y'all are saying:

#!/usr/bin/env ruby

require 'loofah'
require 'yaml'

Nokogiri::VERSION_INFO.to_yaml
# => "---\n" +
#    "warnings: []\n" +
#    "nokogiri: 1.8.1\n" +
#    "ruby:\n" +
#    "  version: 2.4.1\n" +
#    "  platform: x86_64-linux\n" +
#    "  description: ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]\n" +
#    "  engine: ruby\n" +
#    "libxml:\n" +
#    "  binding: extension\n" +
#    "  source: packaged\n" +
#    "  libxml2_path: \"/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/nokogiri-1.8.1/ports/x86_64-pc-linux-gnu/libxml2/2.9.5\"\n" +
#    "  libxslt_path: \"/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/nokogiri-1.8.1/ports/x86_64-pc-linux-gnu/libxslt/1.1.30\"\n" +
#    "  libxml2_patches: []\n" +
#    "  libxslt_patches: []\n" +
#    "  compiled: 2.9.5\n" +
#    "  loaded: 2.9.5\n"

html = "<script><script src='malicious.js'></script>"
Loofah.fragment(html).scrub!(:strip).to_html # => "<script src='malicious.js'>"
Loofah.fragment(html).scrub!(:strip).to_text # => "&lt;script src='malicious.js'&gt;"

html = "<div><script><script>alert('a');</script></script></div>"
Loofah.fragment(html).scrub!(:strip).to_html # => "<div><script>alert('a');</div>"
Loofah.fragment(html).scrub!(:strip).to_text # => "\n&lt;script&gt;alert('a');\n"

html = "<<s>script src='malicious.js'>"
Loofah.fragment(html).scrub!(:strip).to_html # => "&lt;<s>script src='malicious.js'&gt;</s>"
Loofah.fragment(html).scrub!(:strip).to_text # => "&lt;script src='malicious.js'&gt;"
Loofah.fragment(html).scrub!(:strip).to_text(encode_special_chars: false) # => "<script src='malicious.js'>"
Loofah.fragment(html).scrub!(:prune).to_html # => "&lt;<s>script src='malicious.js'&gt;</s>"
Loofah.fragment(html).scrub!(:prune).to_text # => "&lt;script src='malicious.js'&gt;"
Loofah.fragment(html).scrub!(:prune).to_text(encode_special_chars: false) # => "<script src='malicious.js'>"

html = "<<s>script>alert('a')<<s>/script>"
Loofah.fragment(html).scrub!(:strip).to_html # => "&lt;<s>script&gt;alert('a')&lt;<s>/script&gt;</s></s>"
Loofah.fragment(html).scrub!(:strip).to_text # => "&lt;script&gt;alert('a')&lt;/script&gt;"
Loofah.fragment(html).scrub!(:strip).to_text(encode_special_chars: false) # => "<script>alert('a')</script>"
Loofah.fragment(html).scrub!(:prune).to_html # => "&lt;<s>script&gt;alert('a')&lt;<s>/script&gt;</s></s>"
Loofah.fragment(html).scrub!(:prune).to_text # => "&lt;script&gt;alert('a')&lt;/script&gt;"
Loofah.fragment(html).scrub!(:prune).to_text(encode_special_chars: false) # => "<script>alert('a')</script>"

Can I get some help understanding what you want the output to be? What's the failing test (meaning, what's the desired output)?

flavorjones · 2017-11-12T21:47:51Z

That is, what is Nokogiri/Loofah supposed to do when you ask for entities to be delivered unencoded?

myxoh · 2017-11-13T10:55:25Z

Hi , sorry for the late response. The problem is not that the encode_special_chars: false is returning the strip unencoded, the problem is that the SCRUBBER is not deleting this:
<script>alert('a')</script>

The ideal outcome is that, if the input is:
<<s>script src='malicious.js'>
We recognize that once we strip the <s> tag out, the remaining string: <script src='malicious.js'> is a new tag that needs scrubbing (hence the input is an empty string). This is not what happens as Nokogiri treats the empty bits of tag as text, and encodes them before realizing that the whole thing together makes a string.

Hence this is my expected input / output:

If my input is: <b>I want innocent special chars & <></b>
And my process is: Loofah.fragment(html).scrub!(:strip).to_text(encode_special_chars: false)
My output should be: I want innocent special chars & <>

If my input is: <<s>script>alert('a')<<s>/script>
And my process is: Loofah.fragment(html).scrub!(:strip).to_text(encode_special_chars: false)
My output should be: ``

However, if my input is: <script>alert('a')</script>
And my process is: Loofah.fragment(html).scrub!(:strip).to_text(encode_special_chars: false)
My output should be: <script>alert('a')</script>

(I could live without this last one, but this is what I would expect)

The issue (in our case) is that we ARE encoding further down the line, hence we do NOT need Loofah to encode for us (hence encode_special_chars: false), but we DO need Loofah to scrub the input as was given, instead of encoding it first, scrubbing it, and then providing an output that, if passed again through Loofah would actually be cleaned. (We are using loofah-activerecord, but then encoding on the views)

flavorjones · 2017-11-13T14:36:38Z

Thanks for explaining a bit more.

I'm going to preface my remarks by saying that I no longer do Rails every day, and so my understanding of current usage of Loofah and Rails's view rendering code is pretty weak. I'm hesitant to make a change to account for examples 2 and 3 in the OP, but I'm open to it, particularly if Rails Core people tell me it's what Loofah should do.

I'm CCing @kaspth and @rafaelfranca who have some of the original context, and with whom I worked when Loofah was introduced as Rails's default sanitizer. Maybe one or both of them can help us sort out the right thing to do?

The big question I still have is: If your app is encoding these strings later in the rendering pipeline, can you help me understand why this behavior is a vulnerability? By default (by encoding special characters), Loofah is doing the right thing; encode_special_chars is intended as a workaround for special cases, and it sounds like you've got a special case, and you're doing the right thing to mitigate the unencoded characters.

Examples 2 and 3 are essentially asking Loofah to recursively (maybe only once? maybe more?) scrub to_text output when encode_special_chars is unset. This would break behavior for valid use cases like

<div>Javascript Lesson One: <tt>&lt;script&gt;alert('a')&lt;/script&gt;</tt></div>"

which after one round of scrubbing will output the following not-markup, a.k.a. text (again, only when encode_special_chars is turned off):

This is what your script looks like: <script>alert('a')</script>

and after two rounds:

This is what your script looks like: alert('a')

Part of the reason that Rails Core upgraded to Loofah was to handle use cases such as this, where (properly escaped) HTML was being quoted in the document.

I admit that Loofah's behavior is different from previous Rails sanitizer behavior, particularly in ways related to entities, which are discussed in the two rails issues I linked to above. The big difference is the introduction of "rendering context" -- whether you're asking Loofah (and Nokogiri) to emit html or text.

If you ask for HTML you'll never get text nodes that look like HTML (as far as I know now, other than the example 1 mentioned in this ticket for which there exists a fix).

If you do ask for text, and opt into special characters not being encoded, then you might get text that looks like markup; but you've accepted this risk by asking Loofah to step outside its normal behavior and deliver text with < and > characters.

proposed fix for #127

flavorjones · 2017-11-13T14:42:08Z

Note that I've merged #132 which addresses example 1.

myxoh · 2017-11-13T15:51:43Z

This is a vulnerability for us as we are providing an open API, this API will respond with the text unencoded (as most people will do encoding with JS platforms such as Angular, React, or Rails' ERB), however, by providing responses that may have XSS injections we are leaving a potential vulnerability for API consumers that don't sanitize appropiately.

I do understand that asking for text and opting into special characters SHOULD open me to some vulnerabilities.
But I would expect that those vulnerabilities look like one field being <scri and the other field being pt> not allowing one single field to contain an XSS injection.

It may be that the rest of this discussion needs to go on the loofah-activerecord gem and not this one, as I do understand the point you are making,

However, I do feel like it kind of defeats the purpose of scrubbing at all, if an attacker can achieve a result that looks exactly the same as the one that can be achieved without scrubbing.
As in, if we were only encoding the chars, and not scrubbing at all, the most dangerous possible output I would expect is: <script>alert('a')</script> whereas as it stands right now, you could get that same output after scrubbing.

flavorjones · 2018-02-17T03:20:27Z

@myxoh I've been staring at this thread and, I apologize, I still do not understand why you think asking Loofah to return strings with unencoded entities, then getting strings with unencoded entites, is a vulnerability.

By default, either HTML or text is going to contain HTML entities in the case you're bringing up. Again, I'll write the code:

#!/usr/bin/env ruby

require "loofah"

html_fragment = %q{<div>&lt;script&gt;alert('a')&lt;/script&gt;</div>}

Loofah.fragment(html_fragment).scrub!(:strip).to_html
# => "<div>&lt;script&gt;alert('a')&lt;/script&gt;</div>"
Loofah.fragment(html_fragment).scrub!(:strip).to_text
# => "\n" + "&lt;script&gt;alert('a')&lt;/script&gt;\n"
Loofah.fragment(html_fragment).scrub!(:strip).to_text(encode_special_chars: false)
# => "\n" + "<script>alert('a')</script>\n"

Only by opting in to not encoding entities do you see this output. My suggestion would be, please don't opt into the behavior that returns unencoded entities; or else use another sanitization library that provides you with greater control over the output.

I'd like to close this, as I think we've reached a point where we disagree on the desired behavior.

It's possible that I'm still misunderstanding, and if that's the case, I really really do want to understand; but then I also must insist that you provide code and/or failing tests that will illuminate me. I hope you agree that's a fair request.

kaspth · 2018-02-25T17:48:44Z

@flavorjones sorry for letting this sit so long. It seems I'm too late to help though.

I agree with you that closing seems fair based on reading your final comment 😊

This was referenced Oct 10, 2017

Provides workaround for nested scripts #128

Closed

Nested scripts flavorjones/loofah-activerecord#9

Closed

flavorjones added a commit that referenced this issue Oct 22, 2017

handle nested script tags

5f35b01

proposed fix for #127

flavorjones added a commit that referenced this issue Oct 22, 2017

handle nested script tags

1fc952c

proposed fix for #127

flavorjones mentioned this issue Oct 22, 2017

handle nested script tags #132

Closed

flavorjones added a commit that referenced this issue Nov 13, 2017

handle nested script tags

3f21dcd

proposed fix for #127

flavorjones added the discussion label Nov 13, 2017

flavorjones added the will-close label Feb 17, 2018

flavorjones closed this as completed Feb 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested Scripts #127

Nested Scripts #127

myxoh commented Oct 10, 2017 •

edited

Loading

flavorjones commented Oct 22, 2017

flavorjones commented Oct 22, 2017

flavorjones commented Oct 22, 2017

myxoh commented Oct 23, 2017

aaronchi commented Nov 7, 2017 •

edited

Loading

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

myxoh commented Nov 13, 2017

flavorjones commented Nov 13, 2017

flavorjones commented Nov 13, 2017

myxoh commented Nov 13, 2017

flavorjones commented Feb 17, 2018

kaspth commented Feb 25, 2018

Nested Scripts #127

Nested Scripts #127

Comments

myxoh commented Oct 10, 2017 • edited Loading

flavorjones commented Oct 22, 2017

flavorjones commented Oct 22, 2017

flavorjones commented Oct 22, 2017

myxoh commented Oct 23, 2017

aaronchi commented Nov 7, 2017 • edited Loading

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

flavorjones commented Nov 12, 2017

myxoh commented Nov 13, 2017

flavorjones commented Nov 13, 2017

flavorjones commented Nov 13, 2017

myxoh commented Nov 13, 2017

flavorjones commented Feb 17, 2018

kaspth commented Feb 25, 2018

myxoh commented Oct 10, 2017 •

edited

Loading

aaronchi commented Nov 7, 2017 •

edited

Loading