Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment "cropped" is used by deskewing (for non-cropped pages) #55

Closed
wrznr opened this issue Jul 9, 2019 · 4 comments
Closed

Comment "cropped" is used by deskewing (for non-cropped pages) #55

wrznr opened this issue Jul 9, 2019 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@wrznr
Copy link
Contributor

wrznr commented Jul 9, 2019

Running

ocrd-tesserocr-deskew -m mets.xml -I ORIGINAL -O DESKEW -p <(echo '{"operation_level": "page"}')

results in

<?xml version="1.0" encoding="UTF-8"?>
<pc:PcGts xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15">
    <pc:Metadata>
        <pc:Creator>OCR-D/core 1.0.0b10</pc:Creator>
        <pc:Created>2019-07-09T15:37:12.528892</pc:Created>
        <pc:LastChange>2019-07-09T15:37:12.528892</pc:LastChange>
        <pc:MetadataItem type="processingStep" name="preprocessing/optimization/deskewing" value="ocrd-tesserocr-deskew">
            <pc:Labels>
                <pc:Label value="page" type="operation_level"/>
            </pc:Labels>
        </pc:MetadataItem>
    </pc:Metadata>
    <pc:Page imageFilename="https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/GottDie_453779263_tif/jpegs/00000033.tif.original.jpg" imageWidth="1187" imageHeight="1687" readingDirection="left-to-right" textLineOrder="top-to-bottom">
        <pc:AlternativeImage filename="OCR-D-IMG-DESKEW/FILE_0033_OCR-D-IMG-DESKEW.png" comments="cropped"/>
    </pc:Page>
</pc:PcGts>

The comment should be deskewed, right?

@wrznr wrznr added the bug Something isn't working label Jul 9, 2019
@kba
Copy link
Member

kba commented Jul 9, 2019

ag comments

ocrd_tesserocr/deskew.py:121:9:        comments = 'cropped'

@bertsky
Copy link
Collaborator

bertsky commented Jul 9, 2019

No, it should actually be empty. In this case deskewing found angle zero, so the comment does not contain "deskewed". The "cropped" is the default, because the processor always crops from the next-higher level (which happens to be nothing here, because we are on the page and no Border is present).

But I probably just misinterpreted "cropped" in the spec anyway, right?

@kba
Copy link
Member

kba commented Jul 9, 2019

While technically correct, it won't help the user much if the image is marked as cropped but is not actually. I would not set the default value to cropped. Makes concatenating a bit more cumbersome because having to check whether to append with a comma or not.

@bertsky
Copy link
Collaborator

bertsky commented Jul 9, 2019

Thanks @kba – and welcome back!

So my interpretation of cropped is not in fact wrong? (I considered it to be true after some PIL.Image.crop. But one might consider only page-level cropping to consitute the case for cropped.)

@bertsky bertsky mentioned this issue Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants