AssertionError: CUDA unavailable, invalid device 0 requested

Hi,

I have the A203 Mini PC with Jetson Xavier NX 8GB module, 128GB SSD, 2xUSB 3, RS232, WiFi/BLE, Aluminum case, Pre-installed JetPack 5.0.2 from Seed studios.

I am trying to run the yolov7 github model (kunzhi-yu/yolov7-pose) on docker. I built my own Docker image with the following Dockerfile:


Use an NVIDIA Jetson base image

FROM (I used the nvidia l4t base)

Set the working directory in the container to /app

WORKDIR /app

Copy the current directory contents into the container at /app

ADD . /app

Set the DEBIAN_FRONTEND environment variable to noninteractive

ENV DEBIAN_FRONTEND=noninteractive

Install gcc, python3-dev, and necessary libraries for OpenCV

RUN apt-get update && apt-get install -y
gcc
libgl1-mesa-glx
libglib2.0-0
libsm6
libxext6
libxrender-dev

Install dependencies for building Python and required packages

RUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-de>

Install additional dependencies for Python standard library modules

RUN apt-get install -y liblzma-dev libsqlite3-dev libssl-dev libbz2-dev libgdbm-dev tk-dev libdb-dev

Update SSL certificates

RUN apt-get install -y ca-certificates && update-ca-certificates

Download and install Python 3.9

RUN curl https://bootstrap.pypa.io/get-pip.py | python3.9

Install any needed packages specified in requirements.txt

RUN python3.9 -m pip install --no-cache-dir -r requirements.txt

Set the entrypoint command

ENTRYPOINT [“python3.9”]

Set the default command to pose-estimate.py

CMD [“pose-estimate.py”]


The model runs in a container fine when using cpu. However, it does not run if i specify to use GPU. I get the following error: AssertionError: CUDA unavailable, invalid device 0 requested.

It seems like docker is not able to get access to my GPU. If you can please tell me how I can solve this problem, that would be great.

I also get a different error message if i use --gpus all:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘csv’
invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.: unknown.
ERRO[0000] error waiting for container:

Using the --gpus all --runtime=nvidia command gives me the following:

yolov7_image pose-estimate.py --source /app/data/IMG_7857.MOV --device 0

docker: Error response from daemon: unknown or invalid runtime name: nvidia.

See ‘docker run --help’.

I do think I have container tool kit installed:

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Xavier”
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 7.2
Total amount of global memory: 6855 MBytes (7187841024 bytes)
(006) Multiprocessors, (064) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 1109 MHz (1.11 GHz)
Memory Clock rate: 1109 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

Thank you!