Hi Experts,
I’ve built a Docker image with the CUDA driver on an AGX Orin.
When I run the container using the command docker run -it --rm --runtime nvidia <image name>
, everything works fine.
However, when I try to deploy my CI pipeline on the GitLab server using the docker-compose.yml
file below, the test cases do not run correctly.
I believe the NVIDIA runtime is not being invoked properly.
version: '3'
services:
gitlab-runner:
image: gitlab/gitlab-runner:latest
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
container_name: gitlab-runner
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /etc/gitlab-runner/config.toml:/etc/gitlab-runner/config.toml
This is my gitlab-runner config:
[[runners]]
name = "ddjetson AGX Orin"
url = "xxxxx"
id = 404
token = "xxxxx"
token_obtained_at = 2024-08-16T03:20:53Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "jetsondev"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
network_mtu = 0
pull_policy = "if-not-present"
Hi,
Do you use iGPU driver to build the CUDA image?
Could you share the Dockerfile with us?
Thanks.
Hi @AastaLLL,
Yes, I built the image on AGX Orin with the command docker build -t jetsondev .
Dockerfile
FROM nvcr.io/nvidia/l4t-cuda:12.2.12-runtime
# Install nvidia-l4t-core
RUN \
echo "deb https://repo.download.nvidia.com/jetson/common r36.3 main" >> /etc/apt/sources.list && \
echo "deb https://repo.download.nvidia.com/jetson/t234 r36.3 main" >> /etc/apt/sources.list && \
apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc && \
mkdir -p /opt/nvidia/l4t-packages/ && \
touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall
RUN apt-get update \
&& echo "Y" | apt-get install -y --no-install-recommends nvidia-l4t-core
ENV UDEV=1
# Install CUDA driver 12.5
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb \
&& dpkg -i cuda-keyring_1.1-1_all.deb \
&& apt-get update \
&& apt-get -y install cuda-toolkit-12-5
# Install CUDA Compat 12.5
RUN apt-get update \
&& apt-get -y install cuda-compat-12-5
# Install necessary dependencies including gcc
RUN apt-get update \
&& apt-get install -y wget gdb build-essential git cmake libzmq3-dev pkg-config curl vim python3 python3-pip docker-compose ninja-build \
&& rm -rf /var/lib/apt/lists/*
# Install jtop
RUN pip3 install jetson-stats
WORKDIR /
# Install GCC 12 and G++ 12
RUN apt-get update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:ubuntu-toolchain-r/test \
&& apt-get update \
&& apt-get install -y gcc-12 g++-12 \
&& update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 100 \
&& update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 100
##### Install necessary packages
COPY ./requirements.txt /
RUN pip3 install -r requirements.txt && rm -rf requirements.txt
# Install Google Test
RUN git clone https://github.com/google/googletest.git \
&& cd googletest \
&& mkdir build \
&& cd build \
&& cmake .. \
&& make -j12 \
&& make -j12 install \
&& cd ../.. \
&& rm -rf googletest
# Add lines to ~/.bashrc
RUN echo 'export PATH=/usr/local/cuda-12.5/bin:$PATH' >> ~/.bashrc \
&& echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.5/compat:$LD_LIBRARY_PATH' >> ~/.bashrc
I also updated my gitlab-runner config file but only parts of my test cases pass.
[[runners]]
name = "ddjetson AGX Orin"
url = "xxxxx"
id = 404
token = "xxxxx"
token_obtained_at = 2024-08-16T03:20:53Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "jetsondev"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = "1g"
network_mtu = 0
pull_policy = "if-not-present"
runtime = "nvidia"
Hi,
Is there any error message or failure log can share with us?
It looks like you manually upgraded the CUDA version from 12.2 to 12.5.
Have you tried if docker compose works using the default 12.2 CUDA?
Thanks.
system
Closed
8
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.