Nsight Compute cannot access GPU performance counters

sscott2 · March 7, 2023, 10:01pm

Ubuntu 22.04 Server + GUI desktop
CUDA 12.1

Running ncu-ui I get this error when profiling:

Error: ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device. For instructions on enabling permissions and to get more information see **https://developer.nvidia.com/ERR_NVGPUCTRPERM**,,,,,,

The two suggestions on the linked page were to run with elevated privileges or enable access permanently.

Run with elevated privileges:

…Trying sudo

sscott@demo:~$ sudo ncu-ui
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
Cannot mix incompatible Qt library (5.15.3) with this library (5.15.2)

scott@demo:~$ sudo -E ncu-ui
QStandardPaths: runtime directory '/run/user/1001' is not owned by UID 0, but a directory permissions 0700 owned by UID 1001 GID 1001
QStandardPaths: runtime directory '/run/user/1001' is not owned by UID 0, but a directory permissions 0700 owned by UID 1001 GID 1001
QStandardPaths: runtime directory '/run/user/1001' is not owned by UID 0, but a directory permissions 0700 owned by UID 1001 GID 1001
Cannot mix incompatible Qt library (5.15.3) with this library (5.15.2)

…Trying setcap on executable

sscott@demo:~/esat-rx$ sudo setcap 'cap_sys_admin=+ep' /opt/nvidia/nsight-compute/2023.1.0/host/linux-desktop-glibc_2_11_3-x64/ncu-ui.bin
[sudo] password for sscott: 
sscott@demo:~/esat-rx$ getcap /opt/nvidia/nsight-compute/2023.1.0/host/linux-desktop-glibc_2_11_3-x64/ncu-ui.bin
/opt/nvidia/nsight-compute/2023.1.0/host/linux-desktop-glibc_2_11_3-x64/ncu-ui.bin cap_sys_admin=ep
sscott@demo:~/esat-rx$ ncu-ui
/opt/nvidia/nsight-compute/2023.1.0/host/linux-desktop-glibc_2_11_3-x64/ncu-ui.bin: error while loading shared libraries: libAppLib.so: cannot open shared object file: No such file or directory

Enable access permanently

sscott@demo:~/esat-rx$ cat /etc/modprobe.d/nvidia-profiling.conf 
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"

and rebooted
same error:  ERR_NVGPUCTRPERM

How do I get this to work?

felix_dt · March 8, 2023, 2:20pm

For (2), assuming you copied the string from the website, rather than re-typing it yourself, can you try to replace the " characters in the file? We have seen cases where copy-and-paste would lead to a character with the same symbol but a different encoding, which would then not be recognized by the kernel module.

sscott2 · March 8, 2023, 6:17pm

sscott@demo:~/esat-rx$ od -t x1 -c /etc/modprobe.d/nvidia-profiling.conf 
0000000  6f  70  74  69  6f  6e  73  20  6e  76  69  64  69  61  20  22
          o   p   t   i   o   n   s       n   v   i   d   i   a       "
0000020  4e  56  72  65  67  5f  52  65  73  74  72  69  63  74  50  72
          N   V   r   e   g   _   R   e   s   t   r   i   c   t   P   r
0000040  6f  66  69  6c  69  6e  67  54  6f  41  64  6d  69  6e  55  73
          o   f   i   l   i   n   g   T   o   A   d   m   i   n   U   s
0000060  65  72  73  3d  30  22  0a  0a
          e   r   s   =   0   "  \n  \n
0000070

felix_dt · March 10, 2023, 4:01pm

If I understand the output properly, it seems the correct character is used in the conf file. I will have to get back to the team to see if we can reproduce the issue internally and check what might be wrong. Some things you could try on your end in the meantime:

Check the dmesg output for suspicious messages.
Try the ncu command line interface, instead of the ncu-ui UI. You could try it as ncu <my-app> for a simple run.

felix_dt · March 13, 2023, 8:46am

I checked with our QA team, and they were not able to reproduce the issue on a similar setup. Can you confirm that you followed these installation instructions?

Other things to try:

Check if the kernel module is loaded with the correct parameter or not by calling $ grep RmProfilingAdminOnly /proc/driver/nvidia/params. The expected output would beRmProfilingAdminOnly: 0
Did you rebuild the initial ramdisk with $ sudo update-initramfs -u -k all ?
You could filter down the dmesg output with $ journalctl --dmesg --boot --grep=nvidia to relevant lines.

sscott2 · March 15, 2023, 12:08am

The problem seems to have resolved itself after power-cycling the server. I had rebooted the server previously, but still could not access the registers. But we had to cycle power on the server, and now the profiler can access the registers. I don’t have any other explanation for why it has started working.

Thanks for looking into it.

Topic		Replies	Views
Nsight Compute ERR_NVGPUCTRPERM Nsight Compute	0	619	August 3, 2019
Can't access GPU performance Counters Nsight Compute	4	1413	October 22, 2021
No Permission to Performance Counters as Root Visual Profiler and nvprof	4	4857	August 26, 2021
cannot profile the application Nsight Compute	2	892	October 26, 2019
ERR_NVGPUCTRPERM Despite following instructions Nsight Compute	2	975	August 24, 2022
Cannot run profiling CUDA Programming and Performance	6	1388	February 7, 2021
ERR_NVGPUCTRPERM permission denied error Nsight Compute	7	1487	September 21, 2021
Error while running a small program in nsight compute Nsight Compute	6	927	July 24, 2023
ERR_NVGPUCTRPERM when using nv-nsight-cu-cli for profilling CUDA Setup and Installation	0	396	May 22, 2022
NCU CLI: No root permission, meet Error: ERR_NVGPUCTRPERM Nsight Compute	6	2758	May 11, 2024

Nsight Compute cannot access GPU performance counters

Related topics