hlp@ubuntu:~$ sudo docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.03-py3
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘csv’
invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.: unknown.
Hello, what does the file /etc/docker/daemon.json look like on your Clara AGX devkit? And to double check, have you followed the documentation’s section https://github.com/nvidia-holoscan/holoscan-docs/blob/main/devkits/clara-agx/clara_agx_user_guide.md#setting-up-docker-and-docker-storage-on-ssd on setting up docker?
thanks, i found that the device not at dGPU mode. but when I run the nvgpuswitch.py fail to install dGPU.
hlp@ubuntu:/usr/local/bin$ sudo nvgpuswitch.py install dGPU
Checking for Mellanox CX-6 driver
Preparing commands to install dGPU. This may take a few moments.
=== INSTALL SUMMARY ===
[1/13] rm /etc/apt/preferences.d/jetson-clara-pin-600
[2/13] wget repo.download.nvidia.com/jetson/jetson-clara-pin-600 -P /etc/apt/preferences.d
[3/13] rm -f /etc/modprobe.d/blacklist-nvidia.conf
[4/13] echo 'blacklist nvgpu' > /etc/modprobe.d/blacklist-nvgpu.conf
[5/13] echo 'options nvidia NVreg_EnableGpuFirmware=0 NVreg_DmaRemapPeerMmio=0' > /etc/modprobe.d/nvidia-holoscan.conf
[6/13] apt-key adv --fetch-keys http://repo.download.nvidia.com/jetson/jetson-ota-public.asc
[7/13] echo 'deb http://repo.download.nvidia.com/jetson/dgpu-rm r34.1.2 main' >> /etc/apt/sources.list.d/l4t_rm.list
[8/13] apt-key adv --fetch-keys https://nvidia.github.io/nvidia-container-runtime/gpgkey
[9/13] echo 'deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /' >> /etc/apt/sources.list.d/l4t_rm.list && echo 'deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /' >> /etc/apt/sources.list.d/l4t_rm.list
[10/13] apt update && apt install -y nvidia-l4t-* nvidia-driver-510 nvidia-dkms-510 nvidia-utils-510 cuda nvidia-container-runtime libnvinfer-bin mstflint
[11/13] echo '/usr/lib/aarch64-linux-gnu/tegra' >> /etc/ld.so.conf.d/nvidia-tegra.conf && ldconfig
[12/13] mkdir /etc/systemd/system/docker.service.d && echo '[Service]' > /etc/systemd/system/docker.service.d/override.conf && echo 'ExecStart=' >> /etc/systemd/system/docker.service.d/override.conf && echo 'ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime' >> /etc/systemd/system/docker.service.d/override.conf
[13/13] ln -sf /etc/nvpmodel/nvpmodel_t194_e3900_dGPU.conf /etc/nvpmodel.conf
=== STARTING INSTALL ===
[1/13] Executing.
# rm /etc/apt/preferences.d/jetson-clara-pin-600
[2/13] Executing.
# wget repo.download.nvidia.com/jetson/jetson-clara-pin-600 -P /etc/apt/preferences.d
--2023-09-20 12:31:29-- http://repo.download.nvidia.com/jetson/jetson-clara-pin-600
Resolving repo.download.nvidia.com (repo.download.nvidia.com)... 23.40.240.40, 23.40.240.59
Connecting to repo.download.nvidia.com (repo.download.nvidia.com)|23.40.240.40|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://repo.download.nvidia.cn/jetson/jetson-clara-pin-600 [following]
--2023-09-20 12:31:29-- http://repo.download.nvidia.cn/jetson/jetson-clara-pin-600
Resolving repo.download.nvidia.cn (repo.download.nvidia.cn)... 175.4.58.178, 180.119.146.98, 180.119.146.99, ...
Connecting to repo.download.nvidia.cn (repo.download.nvidia.cn)|175.4.58.178|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.download.nvidia.cn/jetson/jetson-clara-pin-600 [following]
--2023-09-20 12:31:29-- https://repo.download.nvidia.cn/jetson/jetson-clara-pin-600
Connecting to repo.download.nvidia.cn (repo.download.nvidia.cn)|175.4.58.178|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68 [text/plain]
Saving to: ‘/etc/apt/preferences.d/jetson-clara-pin-600’
jetson-clara-pin-60 100%[===================>] 68 --.-KB/s in 0s
2023-09-20 12:31:29 (3.96 MB/s) - ‘/etc/apt/preferences.d/jetson-clara-pin-600’ saved [68/68]
[3/13] Executing.
# rm -f /etc/modprobe.d/blacklist-nvidia.conf
[4/13] Executing.
# echo 'blacklist nvgpu' > /etc/modprobe.d/blacklist-nvgpu.conf
[5/13] Executing.
# echo 'options nvidia NVreg_EnableGpuFirmware=0 NVreg_DmaRemapPeerMmio=0' > /etc/modprobe.d/nvidia-holoscan.conf
[6/13] Executing.
# apt-key adv --fetch-keys http://repo.download.nvidia.com/jetson/jetson-ota-public.asc
Executing: /tmp/apt-key-gpghome.w5etUrYj7g/gpg.1.sh --fetch-keys http://repo.download.nvidia.com/jetson/jetson-ota-public.asc
gpg: requesting key from 'http://repo.download.nvidia.com/jetson/jetson-ota-public.asc'
gpg: WARNING: unable to fetch URI http://repo.download.nvidia.com/jetson/jetson-ota-public.asc: No data
[7/13] Executing.
# echo 'deb http://repo.download.nvidia.com/jetson/dgpu-rm r34.1.2 main' >> /etc/apt/sources.list.d/l4t_rm.list
[8/13] Executing.
# apt-key adv --fetch-keys https://nvidia.github.io/nvidia-container-runtime/gpgkey
Executing: /tmp/apt-key-gpghome.z50NyjjQVx/gpg.1.sh --fetch-keys https://nvidia.github.io/nvidia-container-runtime/gpgkey
gpg: requesting key from 'https://nvidia.github.io/nvidia-container-runtime/gpgkey'
gpg: key DDCAE044F796ECB0: "NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>" not changed
gpg: Total number processed: 1
gpg: unchanged: 1
sh: 2: ARCH: not found
sh: 2: ARCH: not found
[9/13] Executing.
# echo 'deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/ /' >> /etc/apt/sources.list.d/l4t_rm.list && echo 'deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/ /' >> /etc/apt/sources.list.d/l4t_rm.list
[10/13] Executing.
# apt update && apt install -y nvidia-l4t-* nvidia-driver-510 nvidia-dkms-510 nvidia-utils-510 cuda nvidia-container-runtime libnvinfer-bin mstflint
Hit:2 http://ports.ubuntu.com/ubuntu-ports focal InRelease
Get:1 https://repo.download.nvidia.cn/jetson/dgpu-rm r34.1.2 InRelease [2,544 B]
Hit:3 http://ports.ubuntu.com/ubuntu-ports focal-updates InRelease
Err:1 https://repo.download.nvidia.cn/jetson/dgpu-rm r34.1.2 InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0D296FFB880FB004
Hit:4 http://ports.ubuntu.com/ubuntu-ports focal-backports InRelease
Hit:5 http://ports.ubuntu.com/ubuntu-ports focal-security InRelease
Hit:6 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/arm64 InRelease
Hit:7 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/arm64 InRelease
Reading package lists... Done
W: GPG error: https://repo.download.nvidia.cn/jetson/dgpu-rm r34.1.2 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0D296FFB880FB004
E: The repository 'http://repo.download.nvidia.com/jetson/dgpu-rm r34.1.2 InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Translations (zh_CN) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Translations (zh) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Translations (en) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Translations (zh_CN) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Translations (zh) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Translations (en) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Packages (main/binary-arm64/Packages) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target Translations (main/i18n/Translation-zh_CN) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target Translations (main/i18n/Translation-zh) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target Translations (main/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target DEP-11 (main/dep11/Components-arm64.yml) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target DEP-11 (main/dep11/Components-all.yml) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target DEP-11-icons-small (main/dep11/icons-48x48.tar) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target DEP-11-icons (main/dep11/icons-64x64.tar) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target DEP-11-icons-hidpi (main/dep11/icons-64x64@2.tar) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:1 and /etc/apt/sources.list.d/l4t_rm.list:4
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Translations (zh_CN) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Translations (zh) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Translations (en) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:2 and /etc/apt/sources.list.d/l4t_rm.list:5
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Translations (zh_CN) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Translations (zh) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
W: Target Translations (en) is configured multiple times in /etc/apt/sources.list.d/l4t_rm.list:3 and /etc/apt/sources.list.d/l4t_rm.list:6
ERROR: Install dGPU drivers failed!
sorry i already slove this problem. get the key sudo apt-key adv --fetch-key https://repo.download.nvidia.com/jetson/jetson-ota-public.asc
Thank you very much for your help. I’m now able to run the GPU environment for PyTorch.