cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version in docker container

tnwilly · May 30, 2024, 6:00am

Hi, I am build up a container for AGX Orin but see the error below.
You can reproduce it using the test case: Github

Thanks!

1: unknown file: Failure
1: C++ exception with description “std::bad_alloc: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version” thrown in the test body.
1:
1: [ FAILED ] CUDAfunction.test_cuMath_vec (0 ms)
1: [----------] 1 test from CUDAfunction (0 ms total)
1:
1: [----------] Global test environment tear-down
1: [==========] 1 test from 1 test suite ran. (0 ms total)
1: [ PASSED ] 0 tests.
1: [ FAILED ] 1 test, listed below:
1: [ FAILED ] CUDAfunction.test_cuMath_vec

My environment and Dockerfile:

NVRM version: NVIDIA UNIX Open Kernel Module for aarch64 540.3.0 Release Build (buildbrain@mobile-u64-6367-d8000) Mon May 6 10:21:04 PDT 2024
GCC version: collect2: error: ld returned 1 exit status

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:34:47_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

FROM nvcr.io/nvidia/l4t-cuda:12.2.12-runtime

# Install nvidia-l4t-core
RUN \
    echo "deb https://repo.download.nvidia.com/jetson/common r36.3 main" >> /etc/apt/sources.list && \
    echo "deb https://repo.download.nvidia.com/jetson/t234 r36.3 main" >> /etc/apt/sources.list && \
    apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc && \
    mkdir -p /opt/nvidia/l4t-packages/ && \
    touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall

RUN apt-get update \
    && echo "Y" | apt-get install -y --no-install-recommends nvidia-l4t-core

ENV UDEV=1

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb \
  && dpkg -i cuda-keyring_1.1-1_all.deb \
  && apt-get update \
  && apt-get -y install cuda-toolkit-12-5

# Install necessary dependencies including gcc
RUN apt-get update \
    && apt-get install -y wget gdb build-essential git cmake libzmq3-dev pkg-config curl vim python3 python3-pip docker-compose ninja-build \
    && rm -rf /var/lib/apt/lists/*

# Install jtop 
RUN pip3 install jetson-stats

WORKDIR /

# Install GCC 12 and G++ 12
RUN apt-get update \
    && apt-get install -y software-properties-common \
    && add-apt-repository ppa:ubuntu-toolchain-r/test \
    && apt-get update \
    && apt-get install -y gcc-12 g++-12 \
    && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 100 \
    && update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 100

##### Install necessary packages
COPY ./requirements.txt /
RUN pip3 install -r requirements.txt && rm -rf requirements.txt

# Install Google Test
RUN git clone https://github.com/google/googletest.git \
  && cd googletest \
  && mkdir build \
  && cd build \
  && cmake .. \
  && make -j12 \
  && make -j12 install \
  && cd ../.. \
  && rm -rf googletest

# Add lines to ~/.bashrc
RUN echo 'export PATH=/usr/local/cuda-12.5/bin:$PATH' >> ~/.bashrc \
  && echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.5/compat:$LD_LIBRARY_PATH' >> ~/.bashrc

AastaLLL · May 30, 2024, 7:43am

Hi,

Could you share the below file so we can check it in our environment?

...
 => ERROR [ 8/12] COPY ./requirements.txt /                                                                                                                                             0.0s
------
 > [ 8/12] COPY ./requirements.txt /:
------
Dockerfile:41
--------------------
  39 |
  40 |     ##### Install necessary packages
  41 | >>> COPY ./requirements.txt /
  42 |     RUN pip3 install -r requirements.txt && rm -rf requirements.txt
  43 |
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref 9c8f014e-a210-490c-957f-51b73f9a9929::n6r6totovdwk863t0o154y9ud: "/requirements.txt": not found

Thanks.

tnwilly · May 30, 2024, 7:47am

Hi @AastaLLL,

This is the file content. Thanks!

attrs==21.4.0
bitarray==2.4.1
crcmod==1.7
cycler==0.11.0
fonttools==4.38.0
kiwisolver==1.4.4
matplotlib==3.5.1
numpy==1.22.3
packaging==23.0
Pillow==9.4.0
plotly==5.18.0
pyparsing==3.0.9
python-dateutil==2.8.2
scipy==1.8.0
six==1.16.0
tenacity==8.2.3

tnwilly · June 3, 2024, 3:34am

Hi @AastaLLL, any update for this?

AastaLLL · June 3, 2024, 5:56am

Hi,

We are still working on the testing.
Will update more info with you later.

Thanks.

AastaLLL · June 6, 2024, 5:24am

Hi,

Sorry for the late update.

Please also add the cuda-compat-12-5 installation command in your Dockerfile.
To run a newer CUDA library on the older driver will require the compat lib.

You can also find the compat lib is listed in the installation command on our CUDA website:

$ …
$ sudo apt-get -y install cuda-toolkit-12-5 cuda-compat-12-5

Thanks.

tnwilly · June 11, 2024, 6:09am

Hi @AastaLLL,

I added the cuda-compat 12.5 in the part # Install CUDA Compat 12.5 but still see an error:

You can reproduce with the same test code in the github.
Thanks!

FROM nvcr.io/nvidia/l4t-cuda:12.2.12-runtime

# Install nvidia-l4t-core
RUN \
    echo "deb https://repo.download.nvidia.com/jetson/common r36.3 main" >> /etc/apt/sources.list && \
    echo "deb https://repo.download.nvidia.com/jetson/t234 r36.3 main" >> /etc/apt/sources.list && \
    apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc && \
    mkdir -p /opt/nvidia/l4t-packages/ && \
    touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall

RUN apt-get update \
    && echo "Y" | apt-get install -y --no-install-recommends nvidia-l4t-core

ENV UDEV=1

# Install CUDA driver 12.5
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb \
  && dpkg -i cuda-keyring_1.1-1_all.deb \
  && apt-get update \
  && apt-get -y install cuda-toolkit-12-5

# Install CUDA Compat 12.5
RUN apt-get update \
  && apt-get -y install cuda-compat-12-5

# Install necessary dependencies including gcc
RUN apt-get update \
    && apt-get install -y wget gdb build-essential git cmake libzmq3-dev pkg-config curl vim python3 python3-pip docker-compose ninja-build \
    && rm -rf /var/lib/apt/lists/*

# Install jtop 
RUN pip3 install jetson-stats

WORKDIR /

# Install GCC 12 and G++ 12
RUN apt-get update \
    && apt-get install -y software-properties-common \
    && add-apt-repository ppa:ubuntu-toolchain-r/test \
    && apt-get update \
    && apt-get install -y gcc-12 g++-12 \
    && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 100 \
    && update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 100

##### Install necessary packages
COPY ./requirements.txt /
RUN pip3 install -r requirements.txt && rm -rf requirements.txt

# Pull FFTW tarball
RUN wget http://www.fftw.org/fftw-3.3.10.tar.gz \
  && tar -xvzf fftw-3.3.10.tar.gz \
  && rm -rf fftw-3.3.10.tar.gz \
  && cd fftw-3.3.10 \
  && ./configure \
  && make -j12 \
  && make install

# Pull cppzmq repo
RUN git clone https://github.com/zeromq/cppzmq.git \
  && cd cppzmq \
  && mkdir build \
  && cd build \
  && cmake -DENABLE_DRAFTS=off .. \
  && make -j12 install

# Install Google Test
RUN git clone https://github.com/google/googletest.git \
  && cd googletest \
  && mkdir build \
  && cd build \
  && cmake .. \
  && make -j12 \
  && make -j12 install \
  && cd ../.. \
  && rm -rf googletest

# Add lines to ~/.bashrc
RUN echo 'export PATH=/usr/local/cuda-12.5/bin:$PATH' >> ~/.bashrc \
  && echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.5/compat:$LD_LIBRARY_PATH' >> ~/.bashrc

AastaLLL · June 25, 2024, 7:20am

Hi,

Thanks for the update.

We will test this issue in our environment.
Have you checked if the sample works correctly with the CUDA 12.2?

Thanks.

tnwilly · June 25, 2024, 7:28am

Yep, the test case works well in the official image which is 12.2.

nvcr.io/nvidia/l4t-cuda:12.2.12-runtime

So, I expect if I can also download an official image which already installed 12.5

Thanks!

AastaLLL · June 26, 2024, 8:55am

Hi,

We tested the Dockerfile you shared on June 11.
It can work well without any issues.

Please double-check it.

$ sudo docker build . -t tmp
$ sudo docker run -it --rm --runtime nvidia --network host tmp

# git clone https://github.com/weimin023/testcuda.git
# cd testcuda/
# cmake . && make
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- The CUDA compiler identification is NVIDIA 12.5.40
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda-12.5 (found suitable version "12.5", minimum required is "12.3")
-- Found GTest: /usr/local/lib/cmake/GTest/GTestConfig.cmake (found version "1.14.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /testcuda
[ 50%] Building CUDA object CMakeFiles/unittest.dir/unittest.cu.o
[100%] Linking CUDA executable unittest
[100%] Built target unittest

# ./unittest
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CUDAfunction
[ RUN      ] CUDAfunction.test_cuMath_vec
[       OK ] CUDAfunction.test_cuMath_vec (278 ms)
[----------] 1 test from CUDAfunction (278 ms total)

[----------] 1 test from opencv
[ RUN      ] opencv.open
[       OK ] opencv.open (0 ms)
[----------] 1 test from opencv (0 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 2 test suites ran. (278 ms total)
[  PASSED  ] 2 tests.

Thanks.

system · July 17, 2024, 7:13am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.