Home

Command-line Convertor

The command-line PDF to HTML convertor is contained in the PDFToHTML.jar package that may be downloaded and directly executed on all the java-enabled platforms.

For converting a PDF file to a HTML web page just type: java -jar PDFToHTML.jar <input_file> [<output_file>] where

<input_file> is the path to the source PDF file to be converted.
<output_file> is an optional name of the output HTML file. If not specified, the output name will be the same as the input name with the html suffix.

Options:

-fm=[mode] Font conversion mode. Where [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE_FONTS
-fdir=[path] Directory to extract fonts to. Where [path] = font extract directory ie dir/my-font-dir

Library

Basic Usage

Pdf2Dom may be used as a DOM interface to the Apache PDFBox™ library. The following example shows how to obtain a DOM model from a PDF file:
// load the PDF file using PDFBox
PDDocument pdf = PDDocument.load(new java.io.File("file.pdf"));
// create the DOM parser
PDFDomTree parser = new PDFDomTree();
// parse the file and get the DOM Document
Document dom = parser.createDOM(pdf);

Config Options

PDFDomTreeConfig config = PDFDomTreeConfig.createDefaultConfig();
config.setFontExtractDirectory(fontDir);
config.setFontMode(SAVE_TO_DIR);

PDFDomTree parser = new PDFDomTree(config);

API Documentation

See the PDFDomTree API documentation for more information.

Pdf2Dom API documentation is generated from the last snapshot.

Provide feedback

Saved searches