Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run ocrd network sample #449

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Run ocrd network sample #449

wants to merge 3 commits into from

Conversation

joschrew
Copy link

@joschrew joschrew commented Aug 20, 2024

This PR showcases the usage of ocrd network.

Step by step guide to run the example

  • get o copy of this pr / repo and cd into it
  • the docker images ocrd/all:maximum and ocrd/segment are currently used. Later more processor specific containers like ocrd/tesseocr should be used, but currently many of the images are not ready yet. Because of that ocrd/all is mainly used for now
  • python or python3 with python-click is needed (e.g.: sudo apt install python3-click)
  • Create a directory for the workspaces, which will be mounted to the containers: mkdir /tmp/mydata. This can be configured later. The folder must be owned by the current user
  • create directory for some assets needed for now: mkdir /tmp/path-to-my-assets
  • To run this example a custom logging-conf is needed due to current shortcomings of the ocrd/all docker image. Currently in the images the logfiles are present, owned by root which would cause errors if not using the custom logging conf. Therefore execute cp run-network/my_ocrd_logging.conf /tmp/path-to-my-assets/
  • to run the tesserocr-recognize the Fraktur-data is needed which must be available, like the logging-conf: curl -L "https://github.com/tesseract-ocr/tessdata_best/raw/main/script/Fraktur.traineddata" --output /tmp/path-to-my-assets/Fraktur.traineddata
  • run make run-network to start the setup
  • copy a workflow to process to /tmp/mydata. The workspace I use is finally available here /tmp/mydata/vd18test/mets.xml together with its images in the DEFAULT filegroup. If this name differs, the workflow-script in the next step has to be adjusted
  • create the workflow script in file workflow.txt:
cis-ocropy-binarize      -I DEFAULT -O O-1
anybaseocr-crop          -I O-1     -O O-2
cis-ocropy-denoise       -I O-2     -O O-3  -P dpi 300 -P level-of-operation page -P noise_maxsize 3.0
cis-ocropy-deskew        -I O-3     -O O-4  -P level-of-operation page
tesserocr-segment-region -I O-4     -O O-5  -P padding 5 -P find_tables false -P dpi 300
segment-repair           -I O-5     -O O-6  -P plausibilize true -P plausibilize_merge_min_overlap 0.7
cis-ocropy-clip          -I O-6     -O O-7
cis-ocropy-segment       -I O-7     -O O-8  -P spread 2.4 -P dpi 300
cis-ocropy-dewarp        -I O-8     -O O-9
tesserocr-recognize      -I O-9     -O PAGE -P textequiv_level word -P model Fraktur
fileformat-transform     -I PAGE    -O FULLTEXT
  • run the workflow. For this example I use curl (ocrd network client could be used when PR was merged): curl -v -X POST "localhost:8000/workflow/run?mets_path=/data/vd18test/mets.xml&page_wise=True" -H "Content-type: multipart/form-data" -F "[email protected]"
  • optionally query the status: curl localhost:8000/workflow/job/{your-job-id-from-prev-step} | jq

run-network/creator.py Outdated Show resolved Hide resolved
run-network/creator.py Outdated Show resolved Hide resolved
run-network/creator.py Outdated Show resolved Hide resolved
@stweil
Copy link
Collaborator

stweil commented Aug 20, 2024

Are there already other showcases and documents for ocrd network or even complete installations? I recently started my own first experiments with it (based on a native OCR-D installation, no Docker), found only the documentation which is included in the code and in the specification and therefore appreciate this new sample.

@joschrew
Copy link
Author

Thank you for your help.

I' m afraid I don't know of any other example deployment for ocrd network by now. I have focused on the docker-deployment so far, because no native installation of processors is needed.

@kba kba self-assigned this Aug 21, 2024
@MehmedGIT
Copy link

MehmedGIT commented Aug 21, 2024

Are there already other showcases and documents for ocrd network or even complete installations? I recently started my own first experiments with it (based on a native OCR-D installation, no Docker), found only the documentation which is included in the code and in the specification and therefore appreciate this new sample.

@stweil, there is also this pad with fast instructions for native environment that we were not able to put under ocrd_network docs yet: https://pad.gwdg.de/Ty6IXzhIRa6AvDdC4kTy_g#. Let me know if you need further assistance. Glad to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants