I am using l4t-36.3 docker and I can see torch.cuda.is_available() is True when I am the root user in the docker. However, after I switch to a new user, torch.cuda.is_available() is False
Here is the full error message:
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 801: operation not supported (Triggered internally at /tmp/pytorch/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
I am using jetson AGX orin 64G developer verison, jetpack6.0, docker 27.3.1, docker-compose 1.29.2
The exact same docker file worked in jetpack5.1.1
here is the docker file:
FROM nvcr.io/nvidia/l4t-ml:r36.2.0-py3
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update --no-install-recommends \
&& apt-get install -y apt-utils
RUN apt-get install -y \
build-essential \
cmake \
cppcheck \
gdb \
git \
lsb-release \
software-properties-common \
sudo \
vim \
wget \
tmux \
curl \
less \
net-tools \
byobu \
libgl-dev \
iputils-ping \
nano \
unzip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Add a user with the same user_id as the user outside the container
# Requires a docker build argument `user_id`
ARG user_id=$user_id
ENV USERNAME developer
RUN useradd -U --uid ${user_id} -ms /bin/bash $USERNAME \
&& echo "$USERNAME:$USERNAME" | chpasswd \
&& adduser $USERNAME sudo \
&& echo "$USERNAME ALL=NOPASSWD: ALL" >> /etc/sudoers.d/$USERNAME
# Commands below run as the developer user
USER $USERNAME
# When running a container start in the developer's home folder
WORKDIR /home/$USERNAME
# Set the timezone
RUN export DEBIAN_FRONTEND=noninteractive \
&& sudo apt-get update \
&& sudo -E apt-get install -y \
tzdata \
&& sudo ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime \
&& sudo dpkg-reconfigure --frontend noninteractive tzdata \
&& sudo apt-get clean
RUN mkdir ~/.mmpug
RUN touch ~/.Xauthority
RUN sudo usermod -a -G dialout developer \
&& sudo usermod -a -G tty developer \
&& sudo usermod -a -G video developer \
&& sudo usermod -a -G root developer \
&& sudo groupadd -f -r gpio \
&& sudo usermod -a -G gpio developer
# for ros2
RUN sudo apt update && sudo apt install locales \
&& sudo locale-gen en_US en_US.UTF-8 \
&& sudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 \
&& export LANG=en_US.UTF-8
RUN sudo apt install software-properties-common \
&& sudo add-apt-repository universe \
&& sudo apt update && sudo apt install curl -y \
&& sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/ros2.list > /dev/null
after I entered the normal user, cuda is not available anymore
xhost: unable to open display ""
root@ubuntu:/# python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
KeyboardInterrupt
>>>
root@ubuntu:/# USER developer
bash: USER: command not found
root@ubuntu:/# su developer
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
developer@ubuntu:/$ python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 801: operation not supported (Triggered internally at /tmp/pytorch/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
>>>
I have tried usermod -aG sudo,video,i2c "$USER", it didn’t work
Please help, thanks