-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple Tensorflow / CUDA versions again #263
Comments
Thanks for bringing this up. So the fix you're proposing is to add a CUDA configuration to So, yes, we should document this in the README (and possibly provide a And to solve this for the Docker images, we'd base the If I got this right, then let's implement this. |
Yes, it would seem so.
Embarrassingly, (to my knowledge) we do not provide any documentation on CUDA/GPU setup yet. Not here and not in the setup guide. Besides the information above we would first need to explain which CUDA version is needed for which processors. According to the official compatibility matrix and the experiments conducted by @mikegerber, we'll need up to 11.2 for newest TF 2.5 (which gets dragged in by ocrd_pc_segmentation, and currently also ocrd_calamari) and down to 10.0 for TF 1.15 (still needed by most TF processors). Not all native installations will already have the required driver required for the newest CUDA toolkit. The official requirement for 11.3.1 seems to be But in my view, since we are targetting Ubuntu, I think we should go further than just providing some
(Also, I am not sure about the paths; CUDA toolkit packages for other systems might use different paths for the libraries. So it's probably all Ubuntu specific anyway.)
Yes – also with the above knob. (So for the ocrd_all native we need |
It turns out that there's still a problem with this: images like
If I force the installation of the deb with |
Update: Further investigation reveals that this is a specific problem of But the new core-cuda image is huge: 12 GB instead of 1 GB. |
Here's a list of compatible PyPI-TensorFlow versions and CUDA Toolkit versions: https://github.com/mikegerber/test-nvidia#results I'm not testing TF2.1 there, is this version needed for ocropy and anybaseocr? |
As #289 shows, we can make ocrd_anybaseocr work with newer TF, so this problem will soon be gone. The above ideas have been merged in #270 already, the only problem currently is that the most recent Docker prebuild of the maximum-cuda variant did not complete – which will be solved by #287. So when everything falls in place, I think we can close #279. But this one we can already close. |
To my knowledge, despite our efforts to work around the Tensorflow dependency hell (each TF version being tied closely to a narrow range of CUDA / Python / Numpy versions, and in turn CUDA being dependent on certain libcudnn / nvidia-driver), we have not yet tackled the problem of providing GPU access to multiple OCR-D processors relying on different TF versions at the same time yet.
However, for native installations, the solution is not far away: Since Nvidia put the version numbers into all the package names, it is in principle possible to install multiple versions of CUDA runtime and cuDNN at the same time – as long as they all can agree on a suitable
nvidia-driver
(which is usually the newest; luckily, this one appears to be largely backwards compatible). The problem is that TF loads the libcudart dynamically and to that end, needs the right version in the dynamic linker/loader's search path. But the CUDA packages seem to only activate the last installed CUDA toolkit in ld.so.conf. This is easily fixed, however:This _does_ work:
(Not entirely sure whether we need all of
cuda-XY
or just individual parts likecuda-cudart-XY cuda-curand-XY cuda-cusolver-XY cuda-cusparse-XY cuda-cublas-XY cuda-cuffs-XY
though.)Thus, all we have to do is document this in the README (and maybe add rules to
deps-ubuntu
).For the Docker option, it's the same story: As long as we need to build a fat image accommodating all modules, we have to do the same as above within Docker. Until now, we chose the oldest base image
nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
forocrd/core-cuda
, because we usually needed the TF1 processors to have GPU access more than the TF2 processors. However, with the knowledge from above, we can work our way backwards from an image with the newest nvidia-driver, and install the older CUDA versions in there – via the same extended makefile rules.The text was updated successfully, but these errors were encountered: