Parse HTML List #290

sixdouglas · 2019-10-29T20:00:26Z

Fixing the #288
Trying to reproduce the bug

sixdouglas · 2019-10-29T20:01:15Z

Here is the result in the generated PDF

sixdouglas · 2019-10-29T20:02:22Z

Not reproducing the problem so far

rammetzger · 2019-11-04T07:51:57Z

In the following example, HTMLWorker.parseToList() should return a list of items, with some items containing a list of items.
We can not use HtmlParser.parse() because we need to edit the result list before writing to PDF document.

final String htmlText = 
    "<html>"
  + "<body>"
  + "<p>What should you say?</p>"
  + "<ul>"
  + "  <li>Hello</li>"
  + "  <li>World</li>"
  + "</ul>"
  + "<ol>"
  + "  <li>Element-1"
  + "    <ol>"
  + "      <li>Element-1-1</li>"
  + "      <li>Element-1-2</li>"
  + "    </ol>"
  + "  </li>"
  + "  <li>Element-2"
  + "    <ol>"
  + "      <li>Element-2-1</li>"
  + "      <li>Element-2-2</li>"
  + "    </ol>"
  + "  </li>"
  + "</ol>"
  + "</body>"
  + "</html>";
		
final StringReader reader = new StringReader(htmlText);

final StyleSheet styleSheet = new StyleSheet();
// styleSheet.loadTagStyle(...)

final Map<String, Object> interfaceProps = new HashMap<>();
// interfaceProps.put(...);

final List<com.lowagie.text.Element> elements = HTMLWorker.parseToList(reader, styleSheet, interfaceProps);

The expected result is:

+---------------------------+----------------------------+
| What should you say       | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Hello                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| World                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Element-1                 | type()==Element.LIST       |
|     +---------------------+--------------------------+ +
|     | Element-1-1         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
|     | Element-1-2         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
+---------------------------+----------------------------+
| Element-2                 | type()==Element.LIST       |
|     +---------------------+--------------------------+ +
|     | Element-2-1         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
|     | Element-2-2         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
+---------------------------+----------------------------+

But the current result is:

+---------------------------+----------------------------+
| What should you say       | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Hello                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| World                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Element-1                 | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Element-2                 | type()==Element.LISTITEM   |
+---------------------------+----------------------------+

sixdouglas · 2019-11-07T06:23:13Z

You're right. With your piece of code the nested list is correctly identified.
I've committed it in this PR though I think you should do it under your own name 😉. If so please close this PR and submit a new one with all the samples. It helps improve the project.
Otherwise I think you can approve the PR.

rammetzger · 2019-11-07T06:41:53Z

Thank you for accepting my suggestion for improvement. I have no idea what the processes are to make changes to projects at Github. I would have to learn all this first. That's why I think it's better if you make those changes for me. I hope we do not find any more bugs. OpenPDF is an excellent tool.

noavarice · 2019-11-10T22:28:49Z

pdf-toolbox/src/test/java/com/lowagie/examples/html/ParseHelloHtml.java

+            PdfWriter.getInstance(document, new FileOutputStream("parseHelloWorld.pdf"));
+            // step 3: we open the document
+            document.open();
+            // step 2:


Step 1: "description"
Step 3: "description"
Step 2: "no description"
...
(⊙_☉)

noavarice · 2019-11-10T22:30:45Z

pdf-toolbox/src/test/java/com/lowagie/examples/html/ParseNestedHtmlList.java

+    public static void main(String[] args) {
+        System.out.println("Parse Nested HTML List");
+        try {
+            final String htmlText =


What's the difference between test above and this one?

This one uses HTMLWorker#parseToList() with a StringReader and produce a list on items.
The previous one uses HtmlParser#parse() with a Document to produce a PDF.

The first one produced a good PDF document with the nested HTML List. But the problem of this issue is really about the HTMLWorker#parseToList() not producing the nested List as expected.

Event if the first test, producing the PDF, is not showing the bug, I choose to keep it as a Non-Regression Test.

But why second example contains HTML markup as a String and not as a file?

noavarice · 2019-11-10T22:31:39Z

README.md

@@ -101,9 +101,9 @@ Significant [Contributors to OpenPDF](https://github.com/LibrePDF/OpenPDF/graphs
  [@lapo-luchini](https://github.com/lapo-luchini)  
  [@jeffrey-easyesi](https://github.com/jeffrey-easyesi)  
  [@V-F](https://github.com/V-F)     
-  [@sixdouglas](https://github.com/sixdouglas) - Douglas Six
+  [@sixdouglas](https://github.com/sixdouglas) - Douglas Six     


The spaces here are meant to be displayed as new line in the Markdown file. There are some at the end of every other lines for this purpose.

rammetzger · 2020-01-21T12:48:25Z

I checked it with 1.3.13-SNAPSHOT and it works fine. Nested lists are displayed as expected.

…

Von: Andreas Rosdal ***@***.***> Is this ready now?

sixdouglas mentioned this pull request Oct 29, 2019

Parsing nested lists with HTMLWorker #288

Closed

sixdouglas force-pushed the nestedHtmlList branch from ad7db30 to 9f39e4e Compare November 1, 2019 07:05

sixdouglas force-pushed the nestedHtmlList branch 2 times, most recently from 2140c24 to 4c781e1 Compare November 7, 2019 06:22

noavarice reviewed Nov 10, 2019

View reviewed changes

Parse HTML List

93f9417

sixdouglas force-pushed the nestedHtmlList branch from 4c781e1 to 93f9417 Compare November 11, 2019 13:21

andreasrosdal added the Needs work label Jan 16, 2020

Merge branch 'master' into nestedHtmlList

853f85c

andreasrosdal merged commit 2d9b5c3 into LibrePDF:master Jan 17, 2020

andreasrosdal removed the Needs work label Jan 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse HTML List #290

Parse HTML List #290

sixdouglas commented Oct 29, 2019

sixdouglas commented Oct 29, 2019

sixdouglas commented Oct 29, 2019

rammetzger commented Nov 4, 2019

sixdouglas commented Nov 7, 2019

rammetzger commented Nov 7, 2019

noavarice Nov 10, 2019

noavarice Nov 10, 2019

sixdouglas Nov 11, 2019

noavarice Nov 11, 2019

noavarice Nov 10, 2019

sixdouglas Nov 11, 2019

rammetzger commented Jan 21, 2020 via email •

edited by asturio

Loading

Parse HTML List #290

Parse HTML List #290

Conversation

sixdouglas commented Oct 29, 2019

sixdouglas commented Oct 29, 2019

sixdouglas commented Oct 29, 2019

rammetzger commented Nov 4, 2019

sixdouglas commented Nov 7, 2019

rammetzger commented Nov 7, 2019

noavarice Nov 10, 2019

Choose a reason for hiding this comment

noavarice Nov 10, 2019

Choose a reason for hiding this comment

sixdouglas Nov 11, 2019

Choose a reason for hiding this comment

noavarice Nov 11, 2019

Choose a reason for hiding this comment

noavarice Nov 10, 2019

Choose a reason for hiding this comment

sixdouglas Nov 11, 2019

Choose a reason for hiding this comment

rammetzger commented Jan 21, 2020 via email • edited by asturio Loading

rammetzger commented Jan 21, 2020 via email •

edited by asturio

Loading