Jetson Orin NX nvidia-container-runtime

brett.dudo · August 7, 2023, 10:57pm

I’m attempting to get this Jetson playing nicely with Kubernetes, using cri-o.

I’ve installed the latest stable toolkit on the Jetson.

NVIDIA Container Runtime version 1.13.5
commit: 6b8589dcb4dead72ab64f14a5912886e6165c079
spec: 1.1.0-rc.2

runc version 1.1.7-0ubuntu1~20.04.1
spec: 1.0.2-dev
go: go1.18.1
libseccomp: 2.5.1

CRI-O is configured with the following

[crio.runtime]
  default_runtime = "nvidia"

[crio.runtime.runtimes.nvidia]
  runtime_path = "/usr/bin/nvidia-container-runtime"
  runtime_type = "oci"
  runtime_root = "/run/nvidia-container-runtime"

nvidia-device-plugin is installed, and has labeled the node accordingly with nvidia.com/gpu: 1

The node shows as such

NAME      STATUS   ROLES           AGE     VERSION       INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
jetson1   Ready    control-plane   7d16h   v1.27.4+k0s   192.168.4.53   <none>        Ubuntu 20.04.6 LTS   5.10.104-tegra   cri-o://1.27.1

I’ve applied a RuntimeClass (though I thought I could do without it if the CRI is defaulting to nvidia)

---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gpu-enabled-class
handler: nvidia

And this is the Pod that I’m testing

---
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-query
spec:
  runtimeClassName: gpu-enabled-class
  restartPolicy: OnFailure
  containers:
    - name: nvidia-query
      image: dudo/test_cuda
      resources:
        limits:
          nvidia.com/gpu: 1
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

This pod executes as intended, running this script, but it doesn’t utilize any gpu when checking jtop. If I run the script directly on the Jetson, jtop shows as expected, and the gpu is utilized, but from the container, nada.

Any ideas on what might be misconfigured? Any recommendations on how to debug this further?

AastaLLL · August 8, 2023, 3:52am

Hi,

Could you share the detailed instructions and configure/source with us?
We need to reproduce this issue in our environment and check with our internal team.

Thanks.

brett.dudo · August 8, 2023, 5:15pm

Hello, @AastaLLL ! I’m pretty deep into a custom setup at this point… I’ve captured a bit of my progress here and here. Can we start with some debugging commands to see if certain expectations are met from your perspective?

AastaLLL · August 9, 2023, 7:02am

Hi,

Sure.

Could you try to run the deviceQuery to check if GPU can work normally?

Thanks.

brett.dudo · August 10, 2023, 6:17am

That works on the Jetson itself.

Trying to build a docker image with it, and blocked on https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/cross-linux-aarch64/cuda-archive-keyring.gpg

Unless you’re hiding some docker images for deviceQuery 😅

brett.dudo · August 10, 2023, 4:24pm

Following this guide, here’s what I have for the Dockerfile. I made it to the make command before realizing I couldn’t build the native binary. I’m getting a 404 with the ubuntu2004/cross-linux-aarch64 keyring.

FROM ubuntu:20.04

RUN apt update && apt install -y make g++ wget curl gnupg

ARG DISTRO=ubuntu2004
ARG ARCH=cross-linux-aarch64
ARG TARGET_ARCH=aarch64
ARG JETSON=r35.4
ARG CUDA=v11.4.1

RUN \
    echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/$DISTRO/$ARCH/ /" | tee /etc/apt/sources.list.d/cuda.list && \
    mkdir -p /usr/share/keyrings && \
    curl -L https://developer.download.nvidia.com/compute/cuda/repos/$DISTRO/$ARCH/cuda-archive-keyring.gpg | gpg --dearmor -o /usr/share/keyrings/cuda-archive-keyring.gpg

RUN \
    echo "deb https://repo.download.nvidia.com/jetson/common $JETSON main" | tee -a /etc/apt/sources.list.d/jetson.list && \
    echo "deb https://repo.download.nvidia.com/jetson/t194 $JETSON main" | tee -a /etc/apt/sources.list.d/jetson.list && \
    apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc && \
    mkdir -p /opt/nvidia/l4t-packages/ && \
    touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall && \
    apt update && \
    apt install -y --no-install-recommends nvidia-l4t-core cuda
RUN apt install -y nvidia-gds

WORKDIR /tmp

RUN wget --no-check-certificate https://github.com/NVIDIA/cuda-samples/archive/refs/tags/$CUDA.tar.gz
RUN tar -xzvf $CUDA.tar.gz --strip-components=1

WORKDIR /tmp/Samples/deviceQuery

RUN make TARGET_ARCH=$TARGET_ARCH

CMD ["./deviceQuery"]

AastaLLL · August 17, 2023, 4:10am

Hi,

Do you want to run the container on Jetson?
If yes, you will need to build it on top of the l4t container.
The general Ubuntu container or cross-linux-aarch64 container is expected to be used on an x86 environment.

For example, try l4t-cuda for the CUDA support:

Thanks.

brett.dudo · August 17, 2023, 6:38am

OMG you’re a life saver! I’ve since ripped out cri-o in favor of containerd, to no avail. I’ve lost HOURS on this. Never considered the base image needed to be tweaked…

This works!! Thank you!!

---
apiVersion: v1
kind: Pod
metadata:
  name: cuda-pod
spec:
  runtimeClassName: nvidia
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: dudo/device_query:11.4
      imagePullPolicy: IfNotPresent
      resources:
        limits:
          nvidia.com/gpu: "1"
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

and this is the Dockerfile

FROM nvcr.io/nvidia/l4t-cuda:11.4.19-runtime

ARG MAJOR=11
ARG MINOR=4
ARG PATCH=1

ENV PATH /usr/local/cuda/bin${PATH:+:$PATH}
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}

RUN apt update && apt install -y make g++ wget

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i cuda-keyring_1.1-1_all.deb && \
    apt update && apt install -y cuda-${MAJOR}-${MINOR}

WORKDIR /usr/src/app

RUN wget --no-check-certificate https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v$MAJOR.$MINOR.$PATCH.tar.gz
RUN tar -xzvf v$MAJOR.$MINOR.$PATCH.tar.gz

WORKDIR /usr/src/app/cuda-samples-$MAJOR.$MINOR.$PATCH/Samples/deviceQuery

RUN make

CMD ["./deviceQuery"]

system · September 11, 2023, 8:06am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`nvidia-container-cli` driver error when trying to run Nvidia docker on Jetson Nano Jetson Nano cuda , containers	6	7282	October 18, 2021
Unable to create a docker to run CUDA on JETSON AGX ORIN for opencv Jetson AGX Orin cuda , docker	3	245	August 1, 2024
nvidia-docker seems unable to use GPU as non-root user Jetson TX2	8	8850	October 18, 2021
Using upgraded Cuda (>11.4) from within nvidia-docker2 / nvidia-container Jetson AGX Orin ros , opencv , cuda , ubuntu , containers	4	3371	June 19, 2023
Cross Compile on x86 for Jetson Xavier Dockerfile with opencv cuda support Jetson Xavier NX opencv , cuda , ubuntu , docker	9	4542	April 1, 2022
Docker issue on permission? Jetson AGX Orin docker	5	74	November 22, 2024
Pytorch installed on l4t-jetpack:r35.4.1 container on Jetson Orin Nano (JetPack 6.0 Developer Kit) fails to recognize CUDA Jetson Orin Nano cuda , docker , pytorch , python , containers	2	67	October 22, 2024
Jetson Xavier NX docker-in-docker issue Jetson Xavier NX cuda , docker	7	1627	November 17, 2021
JetPack 6.3 containerd and kubernetes Jetson AGX Orin nvbugs , containers	12	497	August 22, 2024
Use Jetson AGX Orin’s GPU from Rootless Docker Jetson AGX Orin docker	11	784	June 6, 2024

Jetson Orin NX nvidia-container-runtime

Related topics