MFBpdf is a simple project for easy converting PNM
to (MASK+FG+BG)-pdf.
It uses libtiff and libjpeg for all technichal work and compression.
The breakdown of the image into components is done using DjVuL and DjVuL wiki.
See MFBpdf demo.
Type:
$ make
mfbpdf [options] input.ppm output.tif output.pdf
where options are:
-a # anisotropic regulator DjVuL {0.0}
-b # BG and FG downsample {3}
-c # contrast regulator DjVuL {0.0}
-d # DPI pdf and tiff {300}
-f # FG more downsample {2}
-l # level DjVuL (0 - auto) {0}
-o # overlay blocks DjVuL {0.5}
-q # jpeg quality {75}
-r rewrite tiff
-s # sensitivity for sauvola and blur {0.2}
-t # threshold: djvul, bimod, sauvola, blur, edgeplus {djvul}
-x # linear regulator DjVuL {*1.0}
-y # linear regulator DjVuL {+0.0}
-z black mode
You can use Netpbm or any other similar tool to obtain PNM
from other format. Tiff image for recognition. For example, tesseract
The base of the algorithm was obtained in 2016 by studying the works monday2000 and adapting them to Linux. The prerequisite was the BookScanLib project and the algorithm DjVu Thresholding Binarization. This algorithm embodied good ideas, but had a recursive structure, was a "function with discontinuities" and had a hard color limit. The result of this algorithm, due to the indicated shortcomings and the absence of regulators, was doubtful. After careful study, all the foundations of the specified algorithm were rejected. The new algorithm is based on levels instead of recursion, a smooth weight function is used instead of a "discontinuous" one, no color restriction, BG/FG selection controls are enabled. The new algorithm allowed not only to obtain a much more adequate result, but also gave derivative functions: image division into BG/FG according to the existing mask.