Normal user cannot use cuda device in L4T-36.2 docker

xjh4438318846 · December 5, 2024, 10:01pm

Hi Nvidia,

I am using l4t-36.3 docker and I can see torch.cuda.is_available() is True when I am the root user in the docker. However, after I switch to a new user, torch.cuda.is_available() is False
Here is the full error message:

/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 801: operation not supported (Triggered internally at /tmp/pytorch/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

I am using jetson AGX orin 64G developer verison, jetpack6.0, docker 27.3.1, docker-compose 1.29.2

The exact same docker file worked in jetpack5.1.1
here is the docker file:

FROM nvcr.io/nvidia/l4t-ml:r36.2.0-py3

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update --no-install-recommends \ 
    && apt-get install -y apt-utils 

RUN apt-get install -y \
  build-essential \
  cmake \
  cppcheck \
  gdb \
  git \
  lsb-release \
  software-properties-common \
  sudo \
  vim \
  wget \
  tmux \
  curl \
  less \
  net-tools \
  byobu \
  libgl-dev \
  iputils-ping \
  nano \
  unzip \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*


# Add a user with the same user_id as the user outside the container
# Requires a docker build argument `user_id`
ARG user_id=$user_id
ENV USERNAME developer
RUN useradd -U --uid ${user_id} -ms /bin/bash $USERNAME \
 && echo "$USERNAME:$USERNAME" | chpasswd \
 && adduser $USERNAME sudo \
 && echo "$USERNAME ALL=NOPASSWD: ALL" >> /etc/sudoers.d/$USERNAME

# Commands below run as the developer user
USER $USERNAME

# When running a container start in the developer's home folder
WORKDIR /home/$USERNAME

# Set the timezone
RUN export DEBIAN_FRONTEND=noninteractive \
 && sudo apt-get update \
 && sudo -E apt-get install -y \
   tzdata \
 && sudo ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime \
 && sudo dpkg-reconfigure --frontend noninteractive tzdata \
 && sudo apt-get clean 



RUN mkdir ~/.mmpug

RUN touch ~/.Xauthority

RUN sudo usermod -a -G dialout developer \
 && sudo usermod -a -G tty developer \
 && sudo usermod -a -G video developer \
 && sudo usermod -a -G root developer \
 && sudo groupadd -f -r gpio \
 && sudo usermod -a -G gpio developer

# for ros2
RUN sudo apt update && sudo apt install locales \
 && sudo locale-gen en_US en_US.UTF-8 \
 && sudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 \
 && export LANG=en_US.UTF-8

RUN sudo apt install software-properties-common \
 && sudo add-apt-repository universe \
 && sudo apt update && sudo apt install curl -y \
 && sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg \
 && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/ros2.list > /dev/null

after I entered the normal user, cuda is not available anymore

xhost:  unable to open display ""
root@ubuntu:/# python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> 
KeyboardInterrupt
>>> 
root@ubuntu:/# USER developer
bash: USER: command not found
root@ubuntu:/# su developer  
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

developer@ubuntu:/$ python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 801: operation not supported (Triggered internally at /tmp/pytorch/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>>

I have tried usermod -aG sudo,video,i2c "$USER", it didn’t work
Please help, thanks

AastaLLL · December 6, 2024, 3:26am

Hi,

How do you launch the container?
Could you try the settings below to see if they help?

Thanks

xjh4438318846 · December 6, 2024, 7:49pm

I use docker start -ai for the container, can you show me what setting you are mentioning?

xjh4438318846 · December 6, 2024, 7:51pm

docker_execute_command="
  docker exec
    --privileged
    -e DISPLAY=${DISPLAY}
    -e LINES=`tput lines`
    -it ${container}

I use docker-compose making the container

  base:
    # extend gpu or non-gpu
    build:
      args:
        - ARCH_T=$JAVIS_ARCH_T
        - JAVIS_ROS_DISTRO=$JAVIS_ROS_DISTRO
        - DOCKER_IMAGE_VERSION=$DOCKER_IMAGE_VERSION
        - user_id=$JAVIS_USERID
        - group_id=$JAVIS_GROUPID
    extends:
      service: ${JAVIS_HOST_TYPE}
    privileged: true
    security_opt:
      - seccomp:unconfined
    ipc: host
    volumes:
      # javis workspace
      - ${JAVIS_PATH}:/home/developer/javis_ws/
      # gui configurations
      - /tmp/.X11-unix:/tmp/.X11-unix
      - /etc/localtime:/etc/localtime:ro
      - /dev/input:/dev/input
      - /dev/:/dev/
      - /etc/hosts:/etc/hosts
      - ~/.javis/auto/deploy.conf:/home/developer/.javis/auto/deploy.conf
      - ${JAVIS_LOGGING_DIR}:/logging
      - /var/log/syslog:/syslog
      - /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra
      #- $XAUTHORITY:/home/developer/.Xauthority:rw
    environment:
      # Set environment params for GUI container passthrough
      - DISPLAY
      - QT_X11_NO_MITSHM=1
      # - XAUTHORITY=/tmp/.docker.xauth
      # - QT_QPA_PLATFORM='offscreen'
      - JAVIS_ROS_DISTRO=${JAVIS_ROS_DISTRO}
      # deployer export for exec call
      - DEPLOYER_TOP_PATH=/home/developer/javis_ws/operations//javis_deploy/deployer/
      - DEPLOYER_BIN=/home/developer/javis_ws/operations//javis_deploy/deployer/bin/
      - DEPLOYER_BOOKS_PATH=/home/developer/javis_ws/operations//javis_deploy/books/
      - JAVIS_PATH=/home/developer/javis_ws/
      - JAVIS_SRC_PATH=/home/developer/javis_ws/src/
      # Set the hostnames of different systems
      - ROS_MASTER_IP=$JAVIS_HOSTNAME
      - ROS_HOSTNAME=$JAVIS_HOSTNAME
      - JAVIS_USERID=$JAVIS_USERID
      - JAVIS_GROUPID=$JAVIS_GROUPID
      - JAVIS_SYSTEM_ID=$JAVIS_SYSTEM_ID
      - JAVIS_SYSTEM_TYPE=$JAVIS_SYSTEM_TYPE
      - JAVIS_SYSTEM_COMPONENT=$JAVIS_SYSTEM_COMPONENT
      - JAVIS_SETUP_SUPPRESS_CHECKS=true
    # entrypoint:
      # - /docker-entrypoint/ws-shell.bash
    tty: true
    runtime: nvidia
    # use host network
    network_mode: "host"
  javis_test:
    image: javis/${JAVIS_ARCH_T}.test:${DOCKER_IMAGE_VERSION}
    build:
      dockerfile: ${JAVIS_DOCKER_PATH}/javis/services/test.dockerfile
      context: ${JAVIS_DOCKER_PATH}/javis/
    extends:
      service: base
    container_name: javis_test
    privileged: true
    #ulimits:
    #  nice: 40
    environment:
      - ROS_SOURCED_WORKSPACE=/home/developer/javis_ws/install/javis_test/setup.bash
    volumes:
      - /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra
      - /usr/src/jetson_multimedia_api:/usr/src/jetson_multimedia_api
      - /usr/src/jetson_multimedia_api/argus:/usr/src/jetson_multimedia_api/argus
      - /etc/nv_tegra_release:/etc/nv_tegra_release
      - /usr/sbin/nvargus-daemon:/usr/sbin/nvargus-daemon
      - /tmp/argus_socket:/tmp/argus_socket
      - /tmp:/tmp
      - /var/nvidia/nvcam/settings/:/var/nvidia/nvcam/settings/
      - /etc/systemd/system:/etc/systemd/system
      - /etc/udev/rules.d/:/etc/udev/rules.d/
    runtime: nvidia
    devices:
      - /dev/i2c-8:/dev/i2c-8
      - /dev/video0:/dev/video0
      - /dev/video1:/dev/video1
      - /dev/video2:/dev/video2
      - /dev/video3:/dev/video3
      - /dev/video4:/dev/video4
      - /dev/video5:/dev/video5
      - /dev/video6:/dev/video6
    ipc: "host"

AastaLLL · December 9, 2024, 10:51am

Hi,

Sorry, I mean below command:

github.com

dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime

# System Setup

Install the latest version of JetPack 4 on Nano/TX1/TX2, JetPack 5 on Xavier, or JetPack 6 on Orin.  The following versions are supported:

* JetPack 4.6.1+ (>= L4T R32.7.1)
* JetPack 5.1+  (>= L4T R35.2.1)
* JetPack 6.0 DP (L4T R36.2.0)
> [!NOTE]  
> <sup>- Building on/for x86 platforms isn't supported at this time (one can typically install/run packages the upstream way there)</sup><br>
> <sup>- The below steps are optional for [pulling/running](/docs/run.md) existing container images from registry, but recommended for building containers locally.</sup>

## Clone the Repo

This will download and install the jetson-containers utilities:

```bash
git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh
```

This file has been truncated. show original

Thanks.

xjh4438318846 · December 9, 2024, 2:57pm

I have tried adding docker-default-runtime option, after sudo systemctl restart docker, here s the output:

jiahe@ubuntu:~$ sudo docker info | grep 'Default Runtime'
 Default Runtime: nvidia
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

and I rebuilt the image and container, the cuda in torch is still not available without root

AastaLLL · December 11, 2024, 7:42am

Hi,

Would you mind checking if a simple docker run command can work?
(instead of docker-compose)?

Thanks.

xjh4438318846 · December 11, 2024, 2:56pm

docker run works

xjh4438318846 · December 11, 2024, 2:57pm

stilll, I hope to do everything on docker-compose, the whole project is built on docker-compose

AastaLLL · December 12, 2024, 9:14am

Hi,

You can find below the steps to set up docker rootless mode.

Could you apply the similar to the docker-compose tool to see if it can also run with non-root account?

Thanks.

system · January 1, 2025, 4:08am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
nvidia-docker seems unable to use GPU as non-root user Jetson TX2	8	9251	October 18, 2021
Accessing the GPU from docker on L4T R32.1 Jetson TX2	7	1528	October 18, 2021
Using CUDA in l4t-cuda Docker container Jetson TX2 cuda , ubuntu , docker	3	1650	May 4, 2022
Difficulty in running cuda based deeplearning (yoloact algo) Docker and NVIDIA Docker cuda , ubuntu , jetson-inference , gpu	0	581	May 7, 2021
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Jetson AGX Xavier cuda	8	49038	October 18, 2021
Run cuda failure inside docker on orin Jetson AGX Orin cuda , kernel , ubuntu	5	997	February 1, 2023
"docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]] Jetson Nano cuda , docker	2	2104	October 18, 2021
Test nvidia-smi by nvidia docker Jetson TX2	2	2305	October 18, 2021
[BUG] target-docker-container running cuda-samples require unintended extra permission DRIVE AGX Orin General docker	12	1598	May 30, 2023
Docker runtime to access the GPU Jetson NX Jetson Xavier NX docker	10	3984	October 18, 2021

Normal user cannot use cuda device in L4T-36.2 docker

Related topics