I have gitlab-runner in container setup on Jetson Xavier NX with JetPack 4.6 and with “default-runtime”: “nvidia”. It work fine.
But after I setup with JetPack 5.1.1. Below error appeared while starting the runner task:
Running with gitlab-runner 16.1.0 (b72e108d) on jetson EyMX9YGH, system ID: r_xod8YRKdUkhy Preparing the "docker" executor 00:35 Using Docker executor with image docker:latest ... Starting service docker:dind ... Using locally found image version due to "if-not-present" pull policy Using docker image sha256:ebaee8bbc7d86875a5443867ac04940d91f854113dc13c2a17ad43d265fe632c for docker:dind with digest docker@sha256:28c6ddb5d7bfdc019fb39cc2797351a6e3e81458ad621808e5e9dd3e41538c77 ... ERROR: Preparation failed: Error response from daemon: failed to create shim task: OCI runtime create failed: nvidia-container-runtime did not terminate successfully: exit status 1: unknown (docker.go:423:1s)
May I know how can I tackle this error? Please let me know if I can provide more information.
Thanks, although I’m not familiar with the above github-runner infrastructure and I’m not sure that I see in here your Dockerfile.jetson or what base image you are using in your container. It doesn’t appear that the nvidia-container-runtime is being used, but from the error you got it appears that it is somehow?
You might want to go back to a more basic example/setup of dind to further debug it or trace back the issue you are having.
I found the variable CUDA_VERSION in gitlab-ci.yml cause the error. I have other build stages for X86 use it and don’t realise this would make the difference. Just renamed it and all pipelines work fine.