nvidia-docker seems unable to use GPU as non-root user

I have come across a potential rough edge with the nvidia docker runtime provided with Jetpack 4.2.1.

All of he following is run on a TX2 module mounted on a Colorado Engineering XCarrier carrier board.

I am working with a deviceQuery binary built locally from the Cuda samples provided in jetpack and I can run it successfully in any user account on the device itself.

When I try to run it in a container under the root user e.g.:

FROM nvcr.io/nvidia/l4t-base:r32.2

COPY deviceQuery .

CMD ./deviceQuery

… it also runs correctly.

BUT if I try to run it as a non-root user inside the container e.g. using:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd

USER user
WORKDIR /home/user

COPY deviceQuery .

CMD ./deviceQuery

… it fails with:

$ docker run --runtime nvidia -it geoff/cudatest:latest
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

I can make it work by forcing deviceQuery to be run as root e.g. using:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd

USER user
WORKDIR /home/user

COPY deviceQuery .

USER root
CMD ./deviceQuery

… but that obviously isn’t ideal!

Is this a bug or am I missing something?

Thanks!,

Geoff

Hi,

Here is our document for nvidia-docker on Jetson:
https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson

You can execute a CUDA sample with the command like this:

$ mkdir /tmp/docker-build && cd /tmp/docker-build
$ cp -r /usr/local/cuda/samples/ ./
$ tee ./Dockerfile <<EOF
FROM nvcr.io/nvidia/l4t-base:r32.2

RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]
EOF

$ sudo docker build -t devicequery .
$ sudo docker run -it --runtime nvidia devicequery

Thanks.

Yes - that works for me too - and is equivalent to the first Dockerfile I give above other than the building the deviceQuery binary during the container build.

It is still running the deviceQuery command as root within the container which is obviously bad practice in any system intended for production.

If I update your Dockerfile to run using a non-root user e.g.:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN apt-get update && apt-get install -y --no-install-recommends make g++

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd

USER user
WORKDIR /home/user

COPY --chown=user:user ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]

… it fails in the same way:

$ sudo docker run -it --runtime nvidia devicequery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

So the question remains why I can’t run deviceQuery as a non-root user within the container given it works fine on the host machine?

Regards,

Geoff

PS. The “sudo” on the docker build and run is unnecessary if your user is in group docker.

Hi,

Thanks for your feedback.
I will check this with our internal team and update information with you later.

Thanks.

When you are running on a system which does not allow non-root, run the “groups” command. Is that user a member of “video”? If not, try adding the user to “video”: "sudo usermod -a -G video ". Note the “-a” is for append and is important…append adds to a group, and without this, the entire set of groups would instead be replaced by only “video”.

Thanks @linuxdev, that works, if I add the user to group video it works as expected e.g.:

FROM nvcr.io/nvidia/l4t-base:r32.2

RUN apt-get update && apt-get install -y --no-install-recommends make g++

RUN useradd -ms /bin/bash user && echo "user:password" | chpasswd && usermod -a -G video user

USER user
WORKDIR /home/user

COPY --chown=user:user ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]

Geoff

Hello Together,

I have the same problem, but the Solution is not working for me.
I use the nvidia-docker-containerfile and modified it:

FROM nvidia/cuda:10.2-devel-ubuntu18.04
LABEL maintainer “My name”

ENV CUDNN_VERSION 7.6.5.32
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends
libcudnn7=$CUDNN_VERSION-1+cuda10.2
libcudnn7-dev=$CUDNN_VERSION-1+cuda10.2
&&
apt-mark hold libcudnn7 &&
rm -rf /var/lib/apt/lists/*

RUN useradd -ms /bin/bash user && echo “user:password” | chpasswd && usermod -a -G video user

USER user
WORKDIR /home/user

Then I cloned the cudasamples gitrepo and try to execute the deviceQuery sample.
With root, everything works fine, but without I get the following error:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL