Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update, add workflow-configuration, deps-ubuntu and docker #6

Merged
merged 21 commits into from
Nov 29, 2019

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Nov 18, 2019

@stweil, this is my attempt at a "fat" docker setup. Having difficulties getting versioning right for my slim docker proposal, I think this is at least a good stopgap. The result can be obtained under dockerhub as bertsky/ocrd_all for now (but I hope someone will publish it under ocrd/all soon).

@mikegerber @kba, I also included an updated version of bertsky/workflow-configuration. (For those afraid of calling make I included a system-wide installation as ocrd-make, simply wrapping the make calls.)

@stweil
Copy link
Collaborator

stweil commented Nov 18, 2019

@kba, should we move the repository to OCR-D? I suggest to keep the name then because I think that just all is too generic.

README.md Outdated Show resolved Hide resolved
* `==5.4.1` (required by ocrd_typegroups_classifier)
* `>=6.2.0` (required by all others)
- Tensorflow:
* `tensorflow-gpu==1.14.0` (required by ocrd_calamari and OCR-D-LAYoutERkennung)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjenckel, would tensorflow-gpu 2.0.0 also work or is a code update to support that possible for OCR-D-LAYoutERkennung?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance, it will be easy for me to hold ocrd_keraslm and cor-asv-ann below 2.0. But the problem will likely re-surface when the first module project wants to go 2.0. (Calamari is already, so ocrd_calamari might follow soon.) The problem then is not only changes in the API (that are rather easy to adapt to) but also that models must be retrained.

BTW, as I explained in #3, there is no tensorflow-gpu for 2.0 (or even 1.15) anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance, it will be easy for me to hold ocrd_keraslm and cor-asv-ann below 2.0.

Done.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 18, 2019

@stweil, regarding the pip inconsistencies, as you can see at the bottom of the Dockerfile, I alreaddy added a workaround there (that can be updated as long as no actual incompatibilities are involved).

Do you want me to move that to the Makefile so it is available for the local installation as well?

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 19, 2019

Do you want me to move that to the Makefile so it is available for the local installation as well?

Done.

@stweil Can you merge?

bertsky referenced this pull request in OCR-D/core Nov 19, 2019
This does not work in containers, and is not good practise otherwise. (All other modules don't use it.)

Plus: we delegate here now in stweil/ocrd_all#6
README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find the updated revision for ocrd_im6convert upstream.

It would be much easier for me to review and merge changes presented in small pull requests which address a single feature and only contain one or two commits.

Makefile Outdated Show resolved Hide resolved
@stweil
Copy link
Collaborator

stweil commented Nov 19, 2019

@bertsky, I started with cherry-picking changes from this pull request, so you'll have to rebase your branch.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 19, 2019

I could not find the updated revision for ocrd_im6convert upstream.

Because it is not merged yet.

It would be much easier for me to review and merge changes presented in small pull requests which address a single feature and only contain one or two commits.

I see. But then those PRs would heavily depend on each other. And some things just belong in one PR. It's been quite difficult making this docker thing work. And now the automatic dependencies for local installation.

@bertsky, I started with cherry-picking changes from this pull request, so you'll have to rebase your branch.

Please don't do this to me.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 19, 2019

@stweil When you take this PR apart, are you sure you want to test every single change here with the huge long docker build, which for performance reasons is non-incremental in the commited version? It was already extremely time-consuming having this repeated layer by layer when I made changes. And testing is a must here, you cannot rely on anything...

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@kba
Copy link
Member

kba commented Nov 20, 2019

I could not find the updated revision for ocrd_im6convert upstream.

It would be much easier for me to review and merge changes presented in small pull requests which address a single feature and only contain one or two commits.

Yeah, maybe keep the submodule updates out of this PR.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 20, 2019

Yeah, maybe keep the submodule updates out of this PR.

I don't understand you guys. Is no-one seeing the connection? Almost all those updates were necessary to get the new functionality (automatic system dependencies, Docker integration, module filtering, workarounds for conflicts) working. In many cases, I conjured up the module updates just for this PR!

With this attitude (keep it small, split it up and rebase), you are making it unnecessarily hard (and frustrating) to bring in solutions. It was already costly to fight through this (waiting for docker build to fail over and over again, wasting tons of network bandwidth, making all this work automatically and maintainably).

I don't want go over all those steps and test them again for no good reason.

You can easily make all / make docker on the PR as a whole – if that fails, we should talk.

@stweil
Copy link
Collaborator

stweil commented Nov 21, 2019

@bertsky, I now get an ocrd/all docker of 5.02 GB when I run make docker with the latest code which includes most of your modifications. Which tests would you suggest to verify that everything is fine?

Makefile Show resolved Hide resolved
@bertsky
Copy link
Collaborator Author

bertsky commented Nov 21, 2019

@bertsky, I now get an ocrd/all docker of 5.02 GB when I run make docker with the latest code which includes most of your modifications. Which tests would you suggest to verify that everything is fine?

Well that's already a (partial) success. (It should be around 7 GB though. Maybe intermediate steps have failed? I recently added set -e to docker.sh to avoid overlooking problems. I'll add that to the PR as soon as the new problem in core is fixed.)

Next steps would be to run some workflow and validate. I am afraid there are no smoke tests for that in core or assets. But you can use workflow-configuration – this is what I do. For example, run

docker run -it -u $(id -u) -v /path/to/workspaces:/data ocrd/all ocrd-make -f gt-binarize-page-olena-sauvola-deskew-page-ocropy-clip-deskew-region-tesseract-resegment-dewarp-ocr-ocropy-tesseract.mk

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 21, 2019

@stweil Note, there seems to be connectivity problems with GH these days – you might have to repeat steps.

@bertsky bertsky requested a review from stweil November 22, 2019 13:38
- encapsulate `deps-ubuntu` from modules recursively
  (including those missing)
- add a dockerfile which can include (up to) all
  executables from all modules without conflicts
- move fix-pip from Dockerfile to Makefile
- allow filtering executables by modules they
  depend on (to easily unselect modules in Docker
  or pip)
- use PIP_INSTALL variable instead of PIP_OPTIONS
  (and everywhere)
- Docker build: increase pip network timeout
- install wget along with deps-ubuntu
- don't depend on wget where not needed
- document fix-pip and full installation dependencies
@stweil
Copy link
Collaborator

stweil commented Nov 25, 2019

You still say non-existing revisions, but as I answered you already, these were perfectly valid revisions, just not on the module's master (yet).

A new revision from a pull request only exists in the repository from which that pull request originated. Such a revision is unavailable for most people.

The rule for a valid submodule revision is simple: the revision must exist in the repository of the submodule (not necessarily in the master branch). If it only exists in your fork, it is not valid.

@stweil
Copy link
Collaborator

stweil commented Nov 25, 2019

@stweil, I cannot compare this, due to the rebase (that's why I opted for a merge commit instead).

These commands might work for your environment:

git fetch
git diff origin/fat-docker

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 25, 2019

A new revision from a pull request only exists in the repository from which that pull request originated. Such a revision is unavailable for most people.

That's simply not true: A PR's commits are automatically added to a new branch pull/NUM/head on the receiving end. Did you even try to just git submodule update?

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 25, 2019

These commands might work for your environment:

git fetch
git diff origin/fat-docker

The updates seem to be ok (but I have not rebuilt and tested the docker images yet). Except for ocrd_im6convert, which definitely needs OCR-D/ocrd_im6convert@1b082f0 (because deps-ubuntu must be run non-interactively), as I said.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 29, 2019

The updates seem to be ok (but I have not rebuilt and tested the docker images yet). Except for ocrd_im6convert, which definitely needs OCR-D/ocrd_im6convert@1b082f0 (because deps-ubuntu must be run non-interactively), as I said.

I reverted to the needed version, which is now also on ocrd_im6convert:master.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 29, 2019

A new revision from a pull request only exists in the repository from which that pull request originated. Such a revision is unavailable for most people.

That's simply not true: A PR's commits are automatically added to a new branch pull/NUM/head on the receiving end. Did you even try to just git submodule update?

With even that out of the way, can you finally merge?

@bertsky bertsky requested a review from stweil November 29, 2019 10:25
Makefile Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
bertsky and others added 4 commits November 29, 2019 13:43
When creating a new venv (because `VIRTUAL_ENV` was _not_ already set in the runtime context of `make`), don't just export `VIRTUAL_ENV`, but force usage of the `activate` script (even for cases of submodules which can hook into existing venvs at install time).

Co-Authored-By: Stefan Weil <[email protected]>
@bertsky
Copy link
Collaborator Author

bertsky commented Nov 29, 2019

@stweil do you want me to update core again here (significantly, we now have processor parameter validation), or are you going to apply that yourself after the PR?

@stweil
Copy link
Collaborator

stweil commented Nov 29, 2019

I have updated core now.

Copy link
Collaborator

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are nearly finished. One last typo is in the way. Thank you for your patience.

README.md Outdated Show resolved Hide resolved
Co-Authored-By: Stefan Weil <[email protected]>
@stweil stweil merged commit 380894d into OCR-D:master Nov 29, 2019
stweil added a commit that referenced this pull request Nov 29, 2019
update, add workflow-configuration, deps-ubuntu and docker

Signed-off-by: Stefan Weil <[email protected]>
@bertsky bertsky deleted the fat-docker branch August 19, 2020 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants