Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi-automatic Annotation - Documentation outdated, Nvidia, NO_PUBKEY A4B469963BF863CC #4707

Open
JRGit4UE opened this issue Jun 21, 2022 · 5 comments

Comments

@JRGit4UE
Copy link

JRGit4UE commented Jun 21, 2022

My actions before raising this issue

Trying to enable semi-automatic annotation from the latest stable version as documented at
https://openvinotoolkit.github.io/cvat/docs/administration/advanced/installation_automatic_annotation/ for GPU SUPPORT fails, as Nvidia has changed a key.

Expected Behaviour

Following the documentation should result in successful installation of
serverless/tensorflow/matterport/mask_rcnn/nuclio

Update documentation to either:

  • tell that there is currently no fix for it
  • or add a correction either to code or documentation

Current Behaviour

Calling
nuctl deploy --project-name cvat \ --path serverless/tensorflow/matterport/mask_rcnn/nuclio \ --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \ --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \ --image cvat/tf.matterport.mask_rcnn_gpu \ --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \ --resource-limit nvidia.com/gpu=1
ends with
Reading package lists... W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.

Possible Solution

According to https://forums.developer.nvidia.com/t/gpg-error-http-developer-download-nvidia-com-compute-cuda-repos-ubuntu1804-x86-64/212904/3 the steps to resolve the problem on Debian based systems is to remove the outdated key and install the current one

sudo apt-key del 7fa2af80
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub

Your Environment

`git log -1
commit d7560bb (HEAD -> develop, origin/develop, origin/HEAD)
Merge: ba4175b b7dba6a
Author: Nico Galoppo [email protected]
Date: Tue May 17 11:25:58 2022 -0500

Merge pull request #4639 from openvinotoolkit/ncgalopp/fix-build

`

  • Docker version: Docker version 20.10.16, build aa7e414
  • Operating System and version: Ubuntu 20.04
  • GPU: P5000
  • nvidia-smi: NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6

Next steps

You may join our Gitter channel for community support.

@karolbadowski
Copy link

karolbadowski commented Aug 2, 2022

Hello.
I am struggling with this problem and it is very urgent, but I do not know how to resolve it.
Maybe I am handling the dockers in a wrong way or modifying a wrong file.

When I try to build .../cvat/serverless/tensorflow/matterport/mask_rcnn_fixed/nuclio/function-gpu.yaml ,

nuctl deploy --project-name cvat --path serverless/tensorflow/matterport/mask_rcnn/nuclio --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." --image cvat/tf.matterport.mask_rcnn_gpu --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' --resource-limit nvidia.com/gpu=1

the log indicates that this problem happens during the execution of line:

RUN apt update && apt install --no-install-recommends -y git curl

Here is the log:

22.08.02 17:47:07.249 nuctl (I) Deploying function {"name": ""}

22.08.02 17:47:07.249 nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.9.1, Git commit: 5fb902dd1fafabed267f79b3267e19804ee93bda, OS: linux, Arch: amd64, Go version: go1.17.10", "name": ""}
22.08.02 17:47:07.436 nuctl (I) Staging files and preparing base images
22.08.02 17:47:07.436 nuctl (W) Python 3.6 runtime is deprecated and will soon not be supported. Please migrate your code and use Python 3.7 runtime (python:3.7) or higher
22.08.02 17:47:07.436 nuctl (I) Building processor image {"registryURL": "", "taggedImageName": "cvat/tf.matterport.mask_rcnn_gpu:latest"}
22.08.02 17:47:07.436 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.9.1-amd64"}
22.08.02 17:47:10.356 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
22.08.02 17:47:14.246 nuctl.platform (I) Building docker image {"image": "cvat/tf.matterport.mask_rcnn_gpu:latest"}
22.08.02 17:47:18.169 nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-568811864/staging", "cmd": "docker build --network host --force-rm -t cvat/tf.matterport.mask_rcnn_gpu:latest -f /tmp/nuclio-build-568811864/staging/Dockerfile.processor --build-arg NUCLIO_LABEL=1.9.1 --build-arg NUCLIO_ARCH=amd64 --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler .", "stderr": "The command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100\n"}
22.08.02 17:47:18.175 nuctl (W) Failed to create a function; setting the function status {"err": "Failed to build processor image", "errVerbose": "\nError - exit status 100\n /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon 51.16MB\r\r\nStep 1/17 : FROM tensorflow/tensorflow:1.15.5-gpu-py3\n ---> 73be11373498\nStep 2/17 : ARG NUCLIO_LABEL\n ---> Using cache\n ---> ce09667e4588\nStep 3/17 : ARG NUCLIO_ARCH\n ---> Using cache\n ---> ee4549ac7db8\nStep 4/17 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR\n ---> Using cache\n ---> 688565186b35\nStep 5/17 : COPY artifacts/processor /usr/local/bin/processor\n ---> Using cache\n ---> 48a3b91efbc1\nStep 6/17 : COPY artifacts/py /opt/nuclio/\n ---> Using cache\n ---> 39ba78f106bd\nStep 7/17 : COPY artifacts/py-whl /opt/nuclio/whl\n ---> Using cache\n ---> 221a56010c52\nStep 8/17 : COPY artifacts/uhttpc /usr/local/bin/uhttpc\n ---> Using cache\n ---> 09519af89f11\nStep 9/17 : COPY handler /opt/nuclio\n ---> Using cache\n ---> f849808a29d6\nStep 10/17 : HEALTHCHECK --interval=1s --timeout=3s CMD /usr/local/bin/uhttpc --url http://127.0.0.1:8082/ready || exit 1\n ---> Using cache\n ---> b0500d1c8d03\nStep 11/17 : RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl\n ---> Using cache\n ---> a965b5f4b9aa\nStep 12/17 : WORKDIR /opt/nuclio\n ---> Using cache\n ---> 24c47938ae64\nStep 13/17 : RUN apt update && apt install --no-install-recommends -y git curl\n ---> Running in 3dff9034dfcc\n\u001b[91m\nWARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n\n\u001b[0mHit:1 http://archive.ubuntu.com/ubuntu bionic InRelease\nGet:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B]\nGet:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]\nGet:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]\nGet:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]\nIgn:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease\nGet:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B]\nGet:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B]\nErr:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease\n The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\nGet:9 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1107 kB]\nGet:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB]\nGet:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB]\nGet:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3336 kB]\nGet:13 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1527 kB]\nGet:14 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2306 kB]\nGet:15 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]\nGet:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]\nGet:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]\nGet:18 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2905 kB]\nGet:19 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1065 kB]\nReading package lists...\n\u001b[91mW: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\nE: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.\n\u001b[0mRemoving intermediate container 3dff9034dfcc\n\nstderr:\nThe command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100\n\n /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n /nuclio/pkg/dockerclient/shell.go:117\nFailed to build docker image\n .../pkg/containerimagebuilderpusher/docker.go:54\nFailed to build processor image\n /nuclio/pkg/processor/build/builder.go:263\nFailed to build processor image"}

Error - exit status 100
/nuclio/pkg/cmdrunner/shellrunner.go:96

Call stack:
stdout:
Sending build context to Docker daemon 51.16MB
Step 1/17 : FROM tensorflow/tensorflow:1.15.5-gpu-py3
---> 73be11373498
Step 2/17 : ARG NUCLIO_LABEL
---> Using cache
---> ce09667e4588
Step 3/17 : ARG NUCLIO_ARCH
---> Using cache
---> ee4549ac7db8
Step 4/17 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR
---> Using cache
---> 688565186b35
Step 5/17 : COPY artifacts/processor /usr/local/bin/processor
---> Using cache
---> 48a3b91efbc1
Step 6/17 : COPY artifacts/py /opt/nuclio/
---> Using cache
---> 39ba78f106bd
Step 7/17 : COPY artifacts/py-whl /opt/nuclio/whl
---> Using cache
---> 221a56010c52
Step 8/17 : COPY artifacts/uhttpc /usr/local/bin/uhttpc
---> Using cache
---> 09519af89f11
Step 9/17 : COPY handler /opt/nuclio
---> Using cache
---> f849808a29d6
Step 10/17 : HEALTHCHECK --interval=1s --timeout=3s CMD /usr/local/bin/uhttpc --url http://127.0.0.1:8082/ready || exit 1
---> Using cache
---> b0500d1c8d03
Step 11/17 : RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl
---> Using cache
---> a965b5f4b9aa
Step 12/17 : WORKDIR /opt/nuclio
---> Using cache
---> 24c47938ae64
Step 13/17 : RUN apt update && apt install --no-install-recommends -y git curl
---> Running in 3dff9034dfcc

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease
Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B]
Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B]
Err:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
Get:9 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1107 kB]
Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB]
Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB]
Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3336 kB]
Get:13 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1527 kB]
Get:14 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2306 kB]
Get:15 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]
Get:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]
Get:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]
Get:18 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2905 kB]
Get:19 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1065 kB]
Reading package lists...
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.
Removing intermediate container 3dff9034dfcc

stderr:
The command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100

/nuclio/pkg/cmdrunner/shellrunner.go:96

Failed to build
/nuclio/pkg/dockerclient/shell.go:117
Failed to build docker image
.../pkg/containerimagebuilderpusher/docker.go:54
Failed to build processor image
/nuclio/pkg/processor/build/builder.go:263
Failed to deploy function
...//nuclio/pkg/platform/abstract/platform.go:198

I have tried to modify the file .../cvat/serverless/tensorflow/matterport/mask_rcnn_fixed/nuclio/function-gpu.yaml and to run the command again. But the log stays the same (so no additional steps were executed between

Step 12/17 : WORKDIR /opt/nuclio
---> Using cache
---> 24c47938ae64
and
Step 13/17 : RUN apt update && apt install --no-install-recommends -y git curl
---> Running in 3dff9034dfcc

Additional steps I wanted to add are commands from NVIDIA/nvidia-container-toolkit#257

I edited this fragment of the function file:

build:
image: cvat/tf.matterport.mask_rcnn
baseImage: tensorflow/tensorflow:1.15.5-gpu-py3
directives:
postCopy:
- kind: WORKDIR
value: /opt/nuclio
- kind: RUN
value: rm /etc/apt/sources.list.d/cuda.list
- kind: RUN
value: rm /etc/apt/sources.list.d/nvidia-ml.list

- kind: RUN
value: apt update && apt install --no-install-recommends -y git curl
- kind: RUN
value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
- kind: RUN
value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
- kind: RUN
value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow

Unfortunately, new steps did not appear in the presented log.
I am wondering whether some compy of this file is cached somewhere in docker and this is why new commands are not seen, or maybe a different file is used, or even maybe my commands are wrong and therefore not executed?
Whichever scenraio it is, I have decided to ask for help here.

This also would be equivalent to solution of this issue.

The matter is very important and urgent. I have many people simultaneously doing heavy computations in that docker on CPU instead of GPU just because of this failure.

@oxyhexagen
Copy link

Have you solved it yet?
I googled everywhere and this is the only issue I found same with me

@JRGit4UE
Copy link
Author

@belkahorry actually I refused to create a docker image on my own and preferred to wait for an update from nvidia

@brucefay1115
Copy link

brucefay1115 commented Oct 19, 2022

I modify serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
not function-gpu.yaml
add apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub

that can build successfully

postCopy:
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub && apt update && apt install --no-install-recommends -y git curl
        - kind: RUN
          value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
        - kind: RUN
          value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
        - kind: RUN
          value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image 'imageio<=2.9.0' Pillow

@VincentChong123
Copy link

VincentChong123 commented Nov 7, 2022

Thanks @brucefay1115! It works!
My system information:
python 3.7 ubuntu18.04
NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4

BTW, my first attempt was not working because of other active container, I have to peform 2 commands below then reattempt.
docker-compose down
docker ps -aq | xargs docker stop | xargs docker rm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants