Can't use nsight compute in nvidia-docker container

I want to use nsight compute in nvidia-docker’s container to profile.
I run it like this:
ncu --mode=launch-and-attach <my program>
and it report:

==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM

I exec modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nvidia-vgpu-vfio nvidia first,
and then modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0.
But it reports an error:
modprobe: FATAL: Module nvidia not found in directory /lib/modules/5.15.0-48-generic
My driversion is 515.65.01. So I try to run modprobe nvidia-515 NVreg_RestrictProfilingToAdminUsers=0 again, but it still doesn’t work.
I also create /etc/modprobe.d/profile.conf
and add
modprobe nvidia-515 NVreg_RestrictProfilingToAdminUsers=0 and modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0 in it,
and restart my container, still not working…
Finally, I run modinfo nvidia (and modinfo nvidia-515), and both of them report an error:modinfo: ERROR: Module alias nvidia not found.(and modinfo: ERROR: Module alias nvidia-515 not found.).
How can I fix this?

Where are you getting the specific container from? While in the container, are you able to execute “nvidia-smi” and share the output?

@jmarusarz Yes, I can execute nvidia-smi in the container, and here is the output:

Tue Sep 27 23:10:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0    N/A /  N/A |      5MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The container is from paddlepaddle.

Are you able to execute Nsight Compute profiles on the same machine but outside the container? We can try and determine if it’s a machine config or a container config.

@jmarusarz yes I can execute Nsight Compute profiles on the same machine outside the container.

Where did you execute the modprobe commands, inside or outside the container? You need to run them on the host system, as this is where the driver is installed, and otherwise it will prevent access to the hardware for profiling purposes (you shouldn’t be able to circumvent the host system’s security by launching a container).

I also create /etc/modprobe.d/profile.conf
and add
modprobe nvidia-515 NVreg_RestrictProfilingToAdminUsers=0

That’s the wrong entry for this file. As explained on the website referenced in the error message, the content of this file should be

options nvidia “NVreg_RestrictProfilingToAdminUsers=0”

When copying the string from here or the website, please double-check locally that the quote characters are simple quotes and not converted (this has happened for some users in the past).

While both options should work, I recommend to go with the .conf file, as this is more reliable and doesn’t have to be renewed every time to reboot the system.