I’m working with a fresh install of the official JetPack 5.1.2 SD card image for the Xavier NX. The system has not been modified besides adding a WiFi connection and running the following commands
sudo apt update
sudo apt upgrade
# The system is run in headless mode so remove all desktop software
sudo apt remove ubuntu-desktop
sudo apt autoremove
# Jetson Stats (https://github.com/rbonghi/jetson_stats)
sudo apt install python3-pip
sudo pip3 install -U jetson-stats
When running nvidia-container-cli info I get the following output
$nvidia-container-cli info
NVRM version: (null)
CUDA version: 11.4
Device Index: 0
Device Minor: 0
Model: Xavier
Brand: (null)
GPU UUID: (null)
Bus Location: (null)
Architecture: 7.2
Why is it not showing the NVRM and GPU information? Is this an error in the JetPack version?
I’ve had the same output of nvidia-container-cli info earlier today, when I could not access the GPU in Docker and thought I messed up my system, which was why I did a full re-install of the system. But it does not seem to work and now I’m at a loss what could be the reason why I can not access the GPU inside the NVIDIA container runtime.
Hi @vsaw, Jetson doesn’t support NVRM or nvidia-smi. If you start an l4t container built for JetPack with --runtime nvidia , then the GPU should be accessible. If you start nvcr.io/nvidia/l4t-jetpack:r35.4.1 you can try running the CUDA samples that are in it, like deviceQuery/vectorAdd/ect to confirm your GPU is working inside containers.
@dusty_nv I’ve been doing some more debugging and found out that cudnnCreate fails with error CUDNN_STATUS_NOT_INITIALIZED when running darknet in Docker contaier nvidia/cuda:11.4.3-cudnn8-runtime-ubuntu20.04.
The same code works as expected when running natively on my Jetson Xavier NX DevKit running JetPack 5.1.2 (CUDA 11.4, cuDNNN 8.6), and I confirmed that cudaGetDevice is called successfully before calling cudnnCreate.
Therefore I doubt that this is a code error, and more likely an issue with the container or image. However, I don’t know how to fix this from here 🤷♂️
Update
I got it working with nvcr.io/nvidia/l4t-jetpack:r35.4.1. The trick was to add -L/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs to LDFLAGS.
Is there an alternative image that’s smaller? 10gb is pretty hefty just to run Darknet :-/
Once your application is built, you can deploy it in l4t-cuda which has runtime containers that don’t include the full CUDA Toolkit. On JetPack 5, these components are inside the containers themselves as opposed to mounted from the host device (which is why they are bigger than the JetPack 4 containers)
Also, that 10GB will be shared in the docker cache among any other containers using l4t-jetpack (which most GPU-accelerated containers for Jetson do), so you needed to only download it once.