Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workspace download: also traverse dependent file groups? #412

Closed
bertsky opened this issue Jan 17, 2020 · 4 comments
Closed

workspace download: also traverse dependent file groups? #412

bertsky opened this issue Jan 17, 2020 · 4 comments
Assignees

Comments

@bertsky
Copy link
Collaborator

bertsky commented Jan 17, 2020

When I want to download a PAGE-XML from remote, it would be very helpful if core would also download all the files referenced in /PcGts/Page/@imageFilename and */AlternativeImage/@filename. Is this feasible?

@kba
Copy link
Member

kba commented Jan 17, 2020

It's doable. Related to #378 #323 and #176

@kba
Copy link
Member

kba commented Jun 7, 2020

To clarify: For a PAGE URL https://remote/page.xml, you want to download the PAGE-XML and then resolve the Page/@imageFilename / AlternativeImage/@filename references by prepending http://remote to the file paths?

@bertsky
Copy link
Collaborator Author

bertsky commented Jun 8, 2020

To clarify: For a PAGE URL https://remote/page.xml, you want to download the PAGE-XML and then resolve the Page/@imageFilename / AlternativeImage/@filename references by prepending http://remote to the file paths?

No, not quite (I think). After downloading a PAGE-XML, its (original and derived) image references could be relative paths (and then instead of replacing them with a URL by prepending http://remote it would be better to ensure these relative paths do exist locally by downloading them and adapting their mets:file entry accordingly) or URLs already (in which case they should be replaced by a relative path and downloaded etc).

@kba
Copy link
Member

kba commented Nov 20, 2023

To clarify: For a PAGE URL https://remote/page.xml, you want to download the PAGE-XML and then resolve the Page/@imageFilename / AlternativeImage/@filename references by prepending http://remote to the file paths?

No, not quite (I think). After downloading a PAGE-XML, its (original and derived) image references could be relative paths (and then instead of replacing them with a URL by prepending http://remote it would be better to ensure these relative paths do exist locally by downloading them and adapting their mets:file entry accordingly) or URLs already (in which case they should be replaced by a relative path and downloaded etc).

The difficult part is how to download those references, if they are relative file URL (i.e. were produced by OCR-D before). We do have now support for both local and remote URL #1079 but that is not widely used yet and even if it was, it's unlikely that OCR-D users would expose the intermediary results via URL.

The only way around this restriction is if the remote workspace is available as OCRD-ZIP, in which case we assume that all the referenced image should be in the workspace.

AFAIK nobody except us is using @imageFilename etc. with URL, so supporting that is probably not sensible either.

So unless I'm mistaken, there is no good way to solve this.

@kba kba closed this as completed Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants