Required Info:
- Software Version
DRIVE OS 6.0.6 - Target OS
Linux - SDK Manager Version
1.9.2.10884 - Host Machine Version
native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers
Describe the bug
following with this topic https://developer.nvidia.com/blog/running-docker-containers-directly-on-nvidia-drive-agx-orin/#entry-content-comments.
in target-host, running cuda-samples doesn’t need sudo
, while in target-docker-container, sudo
is needed.
To Reproduce
# in host: compile the cuda-sample
mkdir cuda-sample && cd ./cuda-sample
cp -r /usr/local/cuda/samples/ ./
cd samples/1_Utilities/deviceQuery
make clean && make
# start and into the container
./docker/run/orin_start.sh
./docker/run/orin_into.sh
the keypoint of orin_start.sh
is
+ docker run --runtime nvidia --gpus all -it -d --privileged --name gw_orin_20.04_nvidia -e DOCKER_USER=nvidia -e USER=nvidia -e DOCKER_USER_ID=1000 -e DOCKER_GRP=nvidia -e DOCKER_GRP_ID=1000 -e DOCKER_IMG=arm64v8/ros:foxy -e USE_GPU=1 -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,graphics,video,utility,display -e DISPLAY -v /home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target:/target -v /usr/local/driveworks-5.10:/usr/local/driveworks-5.10 -v /usr/local/cuda-11.4:/usr/local/cuda-11.4 -v /dev:/dev -v /home/nvidia/zhensheng/cuda-sample:/home/nvidia/zhensheng/cuda-sample -v /home/nvidia/.cache:/home/nvidia/.cache -v /dev/bus/usb:/dev/bus/usb -v /media:/media -v /tmp/.X11-unix:/tmp/.X11-unix:rw -v /etc/localtime:/etc/localtime:ro -v /usr/src:/usr/src -v /lib/mgaules:/lib/mgaules --net host --ipc host --cap-add SYS_ADMIN --cap-add SYS_PTRACE -w /target --add-host in_orin_docker:127.0.0.1 --add-host tegra-ubuntu:127.0.0.1 --hostname in_orin_docker --shm-size 2G -v /dev/null:/dev/raw1394 arm64v8/ros:foxy /bin/bash
Expected behavior
# in host
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Orin"
CUDA Driver Version / Runtime Version 11.8 / 11.4
CUDA Capability Major/Minor version number: 8.7
Total amount of global memory: 28458 MBytes (29840424960 bytes)
(016) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1275 MHz (1.27 GHz)
Memory Clock rate: 1275 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 167936 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.8, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS
Actual behavior
# in target-docker-container without sudo
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
NvRmMemInitNvmap failed with Permission denied
351: NvMap init failed
****NvRmMemMgrInit failed**** error type: 196626
cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL
in target-docker-container, with sudo
running cuda-sample give the expected result
# in target-docker-container with sudo
sudo ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Orin"
CUDA Driver Version / Runtime Version 11.8 / 11.4
CUDA Capability Major/Minor version number: 8.7
Total amount of global memory: 28458 MBytes (29840424960 bytes)
(016) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1275 MHz (1.27 GHz)
Memory Clock rate: 1275 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 167936 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.8, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS
Additional context
- How to avoid the extra sudo operation?
Thanks.