Issue with running giltab-runner with JetPack 5.1.1

Hello,

I have gitlab-runner in container setup on Jetson Xavier NX with JetPack 4.6 and with “default-runtime”: “nvidia”. It work fine.

But after I setup with JetPack 5.1.1. Below error appeared while starting the runner task:

Running with gitlab-runner 16.1.0 (b72e108d) on jetson EyMX9YGH, system ID: r_xod8YRKdUkhy Preparing the "docker" executor 00:35 Using Docker executor with image docker:latest ... Starting service docker:dind ... Using locally found image version due to "if-not-present" pull policy Using docker image sha256:ebaee8bbc7d86875a5443867ac04940d91f854113dc13c2a17ad43d265fe632c for docker:dind with digest docker@sha256:28c6ddb5d7bfdc019fb39cc2797351a6e3e81458ad621808e5e9dd3e41538c77 ... ERROR: Preparation failed: Error response from daemon: failed to create shim task: OCI runtime create failed: nvidia-container-runtime did not terminate successfully: exit status 1: unknown (docker.go:423:1s)

May I know how can I tackle this error? Please let me know if I can provide more information.

Thank you.

Ambrose

@ambrose.maker did you rebuild the container for JetPack 5 against l4t-base:r35.3.1 or a similar base container?

I’ve not tried docker-in-docker (dind) on Jetson, but would be curious to if you can share any references or your Dockerfile.

@dusty_nv

This is the command to start the gitlab-runner on the Jetson

docker run \
-d \
-p 8093:8093 \
--hostname gitlab-runner \
--name gitlab-runner \
--restart always \
-v /var/opt/gitlab-runner/config:/etc/gitlab-runner \
-v /var/run/docker.sock:/var/run/docker.sock \
gitlab/gitlab-runner:ubuntu-v16.1.0

And then register the gitlab-runner to gitlab

export REGISTRATION_TOKEN=<YOUR_TOKEN>
export TAG=jetson
export GITLAB_URL=<YOUR_GITLAB_URL>
docker exec -it gitlab-runner gitlab-runner register \
    --non-interactive \
    --executor="docker" \
    --docker-image="docker:latest" \
    --url=$GITLAB_URL \
    --registration-token=$REGISTRATION_TOKEN \
    --description=$TAG \
    --tag-list=$TAG \
    --run-untagged="false" \
    --locked="false" \
    --maximum-timeout="43200" \
    --docker-privileged="true" \
    --docker-volumes="/cache" \
    --docker-volumes="/etc/docker/daemon.json:/etc/docker/daemon.json:ro" \
    --docker-volumes="/var/run/docker.sock:/var/run/docker.sock" \
    --docker-pull-policy="if-not-present"

And here is the gitlab-ci.yml file

stages:
  - build

.ci_reuse:
  dev_jetson_rules:
    - if: $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME
      when: never
    - changes:
        - Dockerfile.jetson

variables:
  L4T_RELEASE: "35.3.1"

build_jetson:
  image: docker:latest
  stage: build
  services:
    - docker:dind
  script:
    - docker build
      --pull
      --network host
      --build-arg L4T_RELEASE=$L4T_RELEASE
      -f ./Dockerfile.jetson
      -t $DOCKER_REG/jetson/$CI_PROJECT_NAME:$CI_COMMIT_REF_NAME
      .
  rules:
    - !reference [.ci_reuse, dev_jetson_rules]
  timeout: 2h
  tags:
    - jetson

Thank you.

Ambrose

Thanks, although I’m not familiar with the above github-runner infrastructure and I’m not sure that I see in here your Dockerfile.jetson or what base image you are using in your container. It doesn’t appear that the nvidia-container-runtime is being used, but from the error you got it appears that it is somehow?

You might want to go back to a more basic example/setup of dind to further debug it or trace back the issue you are having.

@dusty_nv

variables:
  CUDA_VERSION: "11.8.0"
  L4T_RELEASE: "35.3.1"

I found the variable CUDA_VERSION in gitlab-ci.yml cause the error. I have other build stages for X86 use it and don’t realise this would make the difference. Just renamed it and all pipelines work fine.

Thanks a lot.

Ambrose

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.