-
Notifications
You must be signed in to change notification settings - Fork 72
Home
Maddie Abboud edited this page Oct 26, 2021
·
2 revisions
The command-line PDF to HTML convertor is contained in the PDFToHTML.jar package that may be downloaded and directly executed on all the java-enabled platforms.
For converting a PDF file to a HTML web page just type:
java -jar PDFToHTML.jar <input_file>
[<output_file>]
where
-
<input_file>
is the path to the source PDF file to be converted. -
<output_file>
is an optional name of the output HTML file. If not specified, the output name will be the same as the input name with the html suffix.
Options:
-
-fm=[mode]
Font conversion mode. Where [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE_FONTS -
-fdir=[path]
Directory to extract fonts to. Where [path] = font extract directory ie dir/my-font-dir
Pdf2Dom may be used as a DOM interface to the Apache PDFBox™ library. The following example shows how to obtain a DOM model from a PDF file:
// load the PDF file using PDFBox
PDDocument pdf = PDDocument.load(new java.io.File("file.pdf"));
// create the DOM parser
PDFDomTree parser = new PDFDomTree();
// parse the file and get the DOM Document
Document dom = parser.createDOM(pdf);
PDFDomTreeConfig config = PDFDomTreeConfig.createDefaultConfig();
config.setFontExtractDirectory(fontDir);
config.setFontMode(SAVE_TO_DIR);
PDFDomTree parser = new PDFDomTree(config);
See the PDFDomTree API documentation for more information.
Pdf2Dom API documentation is generated from the last snapshot.