Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse HTML List #290

Merged
merged 2 commits into from
Jan 17, 2020
Merged

Parse HTML List #290

merged 2 commits into from
Jan 17, 2020

Conversation

sixdouglas
Copy link
Contributor

Fixing the #288
Trying to reproduce the bug

@sixdouglas
Copy link
Contributor Author

Screenshot 2019-10-29 at 20 58 13

Here is the result in the generated PDF

@sixdouglas
Copy link
Contributor Author

Not reproducing the problem so far

@rammetzger
Copy link

In the following example, HTMLWorker.parseToList() should return a list of items, with some items containing a list of items.
We can not use HtmlParser.parse() because we need to edit the result list before writing to PDF document.

final String htmlText = 
    "<html>"
  + "<body>"
  + "<p>What should you say?</p>"
  + "<ul>"
  + "  <li>Hello</li>"
  + "  <li>World</li>"
  + "</ul>"
  + "<ol>"
  + "  <li>Element-1"
  + "    <ol>"
  + "      <li>Element-1-1</li>"
  + "      <li>Element-1-2</li>"
  + "    </ol>"
  + "  </li>"
  + "  <li>Element-2"
  + "    <ol>"
  + "      <li>Element-2-1</li>"
  + "      <li>Element-2-2</li>"
  + "    </ol>"
  + "  </li>"
  + "</ol>"
  + "</body>"
  + "</html>";
		
final StringReader reader = new StringReader(htmlText);

final StyleSheet styleSheet = new StyleSheet();
// styleSheet.loadTagStyle(...)

final Map<String, Object> interfaceProps = new HashMap<>();
// interfaceProps.put(...);

final List<com.lowagie.text.Element> elements = HTMLWorker.parseToList(reader, styleSheet, interfaceProps);

The expected result is:

+---------------------------+----------------------------+
| What should you say       | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Hello                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| World                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Element-1                 | type()==Element.LIST       |
|     +---------------------+--------------------------+ +
|     | Element-1-1         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
|     | Element-1-2         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
+---------------------------+----------------------------+
| Element-2                 | type()==Element.LIST       |
|     +---------------------+--------------------------+ +
|     | Element-2-1         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
|     | Element-2-2         | type()==Element.LISTITEM | |
|     +---------------------+--------------------------+ +
+---------------------------+----------------------------+

But the current result is:

+---------------------------+----------------------------+
| What should you say       | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Hello                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| World                     | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Element-1                 | type()==Element.LISTITEM   |
+---------------------------+----------------------------+
| Element-2                 | type()==Element.LISTITEM   |
+---------------------------+----------------------------+

@sixdouglas sixdouglas force-pushed the nestedHtmlList branch 2 times, most recently from 2140c24 to 4c781e1 Compare November 7, 2019 06:22
@sixdouglas
Copy link
Contributor Author

You're right. With your piece of code the nested list is correctly identified.
I've committed it in this PR though I think you should do it under your own name 😉. If so please close this PR and submit a new one with all the samples. It helps improve the project.
Otherwise I think you can approve the PR.

@rammetzger
Copy link

Thank you for accepting my suggestion for improvement. I have no idea what the processes are to make changes to projects at Github. I would have to learn all this first. That's why I think it's better if you make those changes for me. I hope we do not find any more bugs. OpenPDF is an excellent tool.

PdfWriter.getInstance(document, new FileOutputStream("parseHelloWorld.pdf"));
// step 3: we open the document
document.open();
// step 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1: "description"
Step 3: "description"
Step 2: "no description"
...
(⊙_☉)

public static void main(String[] args) {
System.out.println("Parse Nested HTML List");
try {
final String htmlText =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between test above and this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one uses HTMLWorker#parseToList() with a StringReader and produce a list on items.
The previous one uses HtmlParser#parse() with a Document to produce a PDF.

The first one produced a good PDF document with the nested HTML List. But the problem of this issue is really about the HTMLWorker#parseToList() not producing the nested List as expected.

Event if the first test, producing the PDF, is not showing the bug, I choose to keep it as a Non-Regression Test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why second example contains HTML markup as a String and not as a file?

README.md Outdated
@@ -101,9 +101,9 @@ Significant [Contributors to OpenPDF](https://github.com/LibrePDF/OpenPDF/graphs
[@lapo-luchini](https://github.com/lapo-luchini)
[@jeffrey-easyesi](https://github.com/jeffrey-easyesi)
[@V-F](https://github.com/V-F)
[@sixdouglas](https://github.com/sixdouglas) - Douglas Six
[@sixdouglas](https://github.com/sixdouglas) - Douglas Six
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spaces here are meant to be displayed as new line in the Markdown file. There are some at the end of every other lines for this purpose.

@andreasrosdal andreasrosdal merged commit 2d9b5c3 into LibrePDF:master Jan 17, 2020
@rammetzger
Copy link

rammetzger commented Jan 21, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants