Running CUDA in LXD container: nvidia-smi doesn't show running processes

capslockwizard · April 23, 2016, 11:58am

Recently I have successfully got CUDA running in an LXD container. Everything seems to work fine but one thing that is bugging me is that nvidia-smi (running this inside the LXD container) doesn’t show the processes running at the moment (it is just empty) even when it is reporting GPU utilization correctly.

I would like to know where nvidia-smi gets the current running processes from, the driver or from the device nodes? By the way nvidia-smi does show the current running processes when I run it on the host machine. Can someone give me some pointers on how get this fixed?

Thanks in advance!

capslockwizard · April 23, 2016, 1:51pm

I think I’ve just figured out the reason for it but I don’t think there is an easy solution for this.

Running strace nvidia-smi, I can actually see that it is getting the PIDs for the processes running at the moment and reading the following file: /proc/PID/cmdline.

The problem is the processes in the container have different PIDs compared to the same processes on the host. nvidia-smi is getting the PIDs of the processes from the host. Then it tries to read /proc/PID/cmdline, which doesn’t exist in the container, therefore nvidia-smi doesn’t report the processes.

So the problem is that we have no idea which PIDs on the host corresponds to which PIDs in the container. Even if we know the mapping, there must a program that monitors the processes in real-time to add/remove soft-links while risking clashes with existing PIDs in the container.

Anyone got any ideas?

njuffa · April 23, 2016, 5:53pm

I may be misunderstanding the situation, but from your description this sounds to me like a flaw in the containerization technology used: Isn’t the whole point of containerization a sort of para-virtualization that provides isolation and abstraction, while making apps running in the container believe they are running on the bare operating system? You may want to discuss this issue with the containerization software vendor.

Robert_Crovella · April 24, 2016, 2:39pm

nvidia has preconfigured docker containers which may be of interest:

[url]https://github.com/NVIDIA/nvidia-docker/wiki/Using-nvidia-docker[/url]

flx42 · April 25, 2016, 9:42pm

I don’t know the details of LXD, but with Docker you will have a similar behavior by default.
It’s because Docker uses a PID namespace, there are multiple namespaces available on Linux:

http://man7.org/linux/man-pages/man7/pid_namespaces.7.html
https://blog.yadutaf.fr/2014/01/05/introduction-to-linux-namespaces-part-3-pid/

In docker, the fix is simple, use “–pid=host”:
$ nvidia-docker run -ti --pid=host nvidia/cuda nvidia-smi

Topic		Replies	Views
`nvidia-smi` command not found in Docker Container CUDA on Windows Subsystem for Linux	3	20400	July 3, 2021
Adding GPU to Docker on Rocky Linux platform Docker and NVIDIA Docker docker , linux , gpu	5	1742	March 2, 2024
Applications not using GPU inside docker container Docker and NVIDIA Docker	1	1264	May 2, 2024
How can I run a container from nvidia/cuda:12.0.1-cudnn8-runtime-ubuntu22.04 using `--gpus` option? CUDA on Windows Subsystem for Linux	2	10675	February 27, 2024
Nvidia-smi does not work inside the container Docker and NVIDIA Docker cudnn	0	90	February 26, 2025
Compute processes: not supported CUDA Setup and Installation	4	24383	March 3, 2014
Nvidia-container-cli not showing mig devices CUDA Setup and Installation	2	1092	August 17, 2023
Driver Seems to Disappear (Containers) CUDA Setup and Installation	2	569	January 8, 2019
Problems installing CUDA drivers for systemd containers CUDA Setup and Installation cuda , kernel , ubuntu , linux-driver	0	951	September 21, 2022
In what step is nvidia-smi supposed to be installed? CUDA Programming and Performance	13	123727	December 16, 2022

Running CUDA in LXD container: nvidia-smi doesn't show running processes

Related topics