OCR-D wrapper for detectron2 based segmentation models
This offers OCR-D compliant workspace processors for document layout analysis with models trained on Detectron2, which implements Faster R-CNN, Mask R-CNN, Cascade R-CNN, Feature Pyramid Networks and Panoptic Segmentation, among others.
In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of models may be difficult, and needs configuration. Class labels (really PAGE-XML region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).
Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.
Create and activate a virtual environment as usual.
To install Python dependencies:
make deps
Which is the equivalent of:
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only
To install this module, then do:
make install
Which is the equivalent of:
pip install .
OCR-D processor interface ocrd-detectron2-segment
To be used with PAGE-XML documents in an OCR-D annotation workflow.
Usage: ocrd-detectron2-segment [OPTIONS]
Detect regions with Detectron2
> Use detectron2 to segment each page into regions.
> Open and deserialize PAGE input files and their respective images.
> Fetch a raw and a binarized image for the page frame (possibly
> cropped and deskewed).
> Feed the raw image into the detectron2 predictor that has been used
> to load the given model. Then, depending on the model capabilities
> (whether it can do panoptic segmentation or only instance
> segmentation, whether the latter can do masks or only bounding
> boxes), post-process the predictions:
> - panoptic segmentation: take the provided segment label map, and
> apply the segment to class label map
> - instance segmentation: find an optimal non-overlapping set (flat
> map) of instances via non-maximum suppression; then extend / shrink
> the surviving masks to fully include / exclude connected components
> in the foreground that are on the boundary
> Finally, find the convex hull polygon for each region, and map its
> class id to a new PAGE region type (and subtype).
> Produce a new output file by serialising the resulting hierarchy.
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-C, --show-resource RESNAME Dump the content of processor resource RESNAME
-L, --list-resources List names of processor resources
-J, --dump-json Dump tool description as JSON and exit
-h, --help This help message
-V, --version Show version
Parameters:
"categories" [array - REQUIRED]
maps each region category (position) of the model to a PAGE region
type (and subtype if separated by colon), e.g.
['TextRegion:paragraph', 'TextRegion:heading',
'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet
"min_confidence" [number - 0.5]
confidence threshold for detections
"model_config" [string - REQUIRED]
path name of model config
"model_weights" [string - REQUIRED]
path name of model weights
"device" [string - "cuda"]
select computing device for Torch (e.g. cpu or cuda:0); will fall
back to CPU if no GPU is available
Example:
ocrd resmgr download -n ocrd-detectron2-segment https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/All_X152.yaml
ocrd resmgr download -n ocrd-detectron2-segment https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/model_final.pth
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config All_X152.yaml -P model_weights model_final.pth -P min_confidence 0.1
Note: These are just examples, no exhaustive search was done yet!
Note: Make sure you unpack first if the download link is an archive. Also, the filename suffix (.pth vs .pkl) of the weight file does matter!
R152-FPN config|weights|["TableRegion"]
R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
X101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
provides different model variants of various depths for multiple datasets:
- PubLayNet (Medical Research Papers)
- TableBank (Tables Computer Typesetting)
- PRImALayout (Various Computer Typesetting)
- HJDataset (Historical Japanese Magazines)
- NewspaperNavigator (Historical Newspapers)
- Math Formula Detection
See here for an overview. You will have to adapt the label map to conform to PAGE-XML region (sub)types accordingly.
X101-FPN archive
Proposed mappings:
["TextRegion:heading", "TextRegion:credit", "TextRegion:caption", "TextRegion:other", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:floating", "TextRegion:paragraph", "TextRegion:endnote", "TextRegion:heading", "TableRegion", "TextRegion:heading"]
(using only predefined@type
)["TextRegion:abstract", "TextRegion:author", "TextRegion:caption", "TextRegion:date", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:list", "TextRegion:paragraph", "TextRegion:reference", "TextRegion:heading", "TableRegion", "TextRegion:title"]
(using@custom
as well)
none yet