we have an issues with remote profiling using Nsight Compute GUI. GPU hardware counters permissions were fixed, GPU clocks fixed. It is A100 with MIG instances. Ubuntu 20.04.
We have such a log:
. . . . .
Launching: /usr/local/cuda-11.2/nsight-compute-2020.3.0/target/linux-desktop-glibc_2_11_3-x64/ncu (host: ip_addr)
==PROF== Attempting to connect to ncu-ui at ip_addr1:50152…
. . . . .
==WARNING== Failed to connect to ncu-ui at ip_addr1:50152.
==WARNING== Failed to connect to ncu-ui at ip_addr2:50152.
==WARNING== Failed to connect to ncu-ui at ip_addr1_v6:50152.
==WARNING== Failed to connect to ncu-ui at ip_addr2_v6:50152.
==ERROR== Could not deploy stock section files to “/home/username/Documents/NVIDIA Nsight Compute/2020.3.0/Sections”.
Folder /home/username/Documents/NVIDIA Nsight Compute/2020.3.0/Sections exists, with 777 permissions. Option --section-folder-restore not helped.
Specifying another port number not helped with ports number opened in firewall.
So several questions:
- Why connection fail is a warning, not an error?
- Is ERROR above is caused by WARNING before, or reason is another?
Finally we can do profiling using ncu CLI on remote host and taking result back to host, but not clear why GUI doesnt work as needed.
Thanks in advance for any help!