Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlexmarkHtmlConverter: blockquotes can be used to generate unbounded memory allocation #625

Open
chibenwa opened this issue Aug 26, 2024 · 1 comment

Comments

@chibenwa
Copy link

Describe the bug

I was considering using Flexmark as a HTML => text/plain engine for Apache James

(We currently rely on an homegrown Jsoup based parser)

I did throw our test suite at flexmark-html2md-converter and triggered an OutOfMemory error after 18 seconds at the given code:

    @Test
    public void boom() {
        String html = ("<blockquote>" +
            "<p>a</p>".repeat(800))
            .repeat(400) + "</blockquote>".repeat(400);

        String plainText = FlexmarkHtmlConverter.builder()
            .build().convert(html);
    }

Will throw an OOM

This is because:

  • The input increases in O(N) with the blockquote nesting level
  • The output increases in O(N2) with the blockquote nesting level (for each paragraph N previous blockquotes is applied

Same code with different parameters:

        String html = ("<blockquote>".repeat(420) +
            "a<br/>".repeat(400 * 420))
             + "</blockquote>".repeat(420);

Generates 1MB of input and 142 MB output.

Those are well in ranges I do encounter in emails.

Is there a way to limit memory that could limit allocated memory (IE size of the output) and just throw when this is exceeded as a defense mechanism?

This would prevent me from DOS attacks though unbounded memory allocation and be a condition for adoption.

@chibenwa
Copy link
Author

chibenwa commented Aug 26, 2024

Similar amplification also exists with lists.

EG:

    @Test
    public void boom() {
        String html = ("<ul>" + "<li>a</li>".repeat(400)).repeat(420)
             + "</ul>".repeat(420);

        System.out.println(html.length() + " bytes");

        String plainText = FlexmarkHtmlConverter.builder()
            .build().convert(html);

        System.out.println(plainText.length() + " bytes");
    }

=>

1683780 bytes
71064000 bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant