nvidia-docker seems unable to use GPU as non-root user

I have come across a potential rough edge with the nvidia docker runtime provided with Jetpack 4.2.1.

All of he following is run on a TX2 module mounted on a Colorado Engineering XCarrier carrier board.

I am working with a deviceQuery binary built locally from the Cuda samples provided in jetpack and I can run it successfully in any user account on the device itself.

When I try to run it in a container under the root user e.g.:

FROM nvcr.io/nvidia/l4t-base:r32.2

COPY deviceQuery .

CMD ./deviceQuery

… it also runs correctly.

BUT if I try to run it as a non-root user inside the container e.g. using:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd

USER user
WORKDIR /home/user

COPY deviceQuery .

CMD ./deviceQuery

… it fails with:

$ docker run --runtime nvidia -it geoff/cudatest:latest
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

I can make it work by forcing deviceQuery to be run as root e.g. using:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd

USER user
WORKDIR /home/user

COPY deviceQuery .

USER root
CMD ./deviceQuery

… but that obviously isn’t ideal!

Is this a bug or am I missing something?

Thanks!,

Geoff

Hi,

Here is our document for nvidia-docker on Jetson:

You can execute a CUDA sample with the command like this:

$ mkdir /tmp/docker-build && cd /tmp/docker-build
$ cp -r /usr/local/cuda/samples/ ./
$ tee ./Dockerfile <<EOF
FROM nvcr.io/nvidia/l4t-base:r32.2

RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]
EOF

$ sudo docker build -t devicequery .
$ sudo docker run -it --runtime nvidia devicequery

Thanks.

Yes - that works for me too - and is equivalent to the first Dockerfile I give above other than the building the deviceQuery binary during the container build.

It is still running the deviceQuery command as root within the container which is obviously bad practice in any system intended for production.

If I update your Dockerfile to run using a non-root user e.g.:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN apt-get update && apt-get install -y --no-install-recommends make g++

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd

USER user
WORKDIR /home/user

COPY --chown=user:user ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]

… it fails in the same way:

$ sudo docker run -it --runtime nvidia devicequery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

So the question remains why I can’t run deviceQuery as a non-root user within the container given it works fine on the host machine?

Regards,

Geoff

PS. The “sudo” on the docker build and run is unnecessary if your user is in group docker.

Hi,

Thanks for your feedback.
I will check this with our internal team and update information with you later.

Thanks.

When you are running on a system which does not allow non-root, run the “groups” command. Is that user a member of “video”? If not, try adding the user to “video”: “sudo usermod -a -G video ”. Note the “-a” is for append and is important…append adds to a group, and without this, the entire set of groups would instead be replaced by only “video”.

Thanks @linuxdev, that works, if I add the user to group video it works as expected e.g.:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN apt-get update && apt-get install -y --no-install-recommends make g++

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd && usermod -a -G video user

USER user
WORKDIR /home/user

COPY --chown=user:user ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]

Geoff

Hello Together,

I have the same problem, but the Solution is not working for me.
I use the nvidia-docker-containerfile and modified it:

FROM nvidia/cuda:10.2-devel-ubuntu18.04
LABEL maintainer “My name”

ENV CUDNN_VERSION 7.6.5.32
LABEL com.nvidia.cudnn.version=“${CUDNN_VERSION}”

RUN apt-get update && apt-get install -y --no-install-recommends
libcudnn7=$CUDNN_VERSION-1+cuda10.2
libcudnn7-dev=$CUDNN_VERSION-1+cuda10.2
&&
apt-mark hold libcudnn7 &&
rm -rf /var/lib/apt/lists/*

RUN useradd -ms /bin/bash user && echo “user:password” | chpasswd && usermod -a -G video user

USER user
WORKDIR /home/user

Then I cloned the cudasamples gitrepo and try to execute the deviceQuery sample.
With root, everything works fine, but without I get the following error:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
→ no CUDA-capable device is detected
Result = FAIL

1 Like

I set up a clean TX2 locally (Based on JP4.4), I can exec “deviceQuery” succeed on build-time. Steps as follow:

  1. Install necessary libs
$ sudo apt update; sudo apt install nvidia-container-runtime

cuda-samples-10-2

; sudo apt update -qq; sudo apt install -qq -y software-properties-common uidmap; sudo add-apt-repository -y ppa:projectatomic/ppa; sudo apt update -qq;sudo apt -qq -y install podman
  1. Pull l4t-base and build
$ git clone https://gitlab.com/nvidia/container-images/l4t-base.git; cd l4t-base; sudo make image
  1. Modify /etc/docker/daemon.json as below:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
	    "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
  1. Create Dockerfile as below:
$ cat Dockerfile 
FROM nvcr.io/nvidia/l4t-base:r32.4.3

RUN apt update && apt install -y --no-install-recommends make g++
COPY ./deviceQuery /tmp
WORKDIR /tmp
RUN ./deviceQuery
  1. docker build via following command:
$ sudo docker build -t devicequery .
  1. Following are my build logs:
Step 3/5 : COPY ./deviceQuery /tmp
 ---> f8b3ad13d4c2
Step 4/5 : WORKDIR /tmp
 ---> Running in 32f2c94b1818
Removing intermediate container 32f2c94b1818
 ---> b29741609a9f
Step 5/5 : RUN ./deviceQuery
 ---> Running in 5953e6ba9075
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X2"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    6.2
  Total amount of global memory:                 3826 MBytes (4011683840 bytes)
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores
  GPU Max Clock rate:                            1300 MHz (1.30 GHz)
  Memory Clock rate:                             1300 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Removing intermediate container 5953e6ba9075
 ---> fcad4905644b
Successfully built fcad4905644b
Successfully tagged devicequery:latest

From my test, deviceQuery can work on build-time.
Not sure are there any gaps on your environment?