Nsight system HPC Linux installation

Hi,

I am having an issue getting Nsight systems to work as I expected.

I am working on a 8x V100 server within a Linux HPC cluster. I have a CUDA 10.1 module installed and loaded.

I then tried to install Nsight system:

NVIDIA_Nsight_Systems_Linux_2020.3.1.72.rpm

but now I get:
Error: Nsight Systems 2019.3.7 hasn’t been installed with CUDA Toolkit 10.1

How do I fix this, preferably so I have a module?

If you install the standalone Nsight Systems RPM downloaded from the developer zone, it will be installed to /opt/nvidia/nsight-systems. So just add /opt/nvidia/nsight-systems/<version>/bin to your PATH. The one you’re currently attempting to interact with is the one attached to the CUDA toolkit in /usr/local/cuda. (I could discuss how to resolve that if you need.)

Hi Thanks for your help. I have now two module a CUDA 10.1 and a nsight-systems 2020.3.1.

As you can see below, I loaded both modules, rebuild my test app, and then ran the nsys profiler, but I still do not get the expected output.

What am I doing wrong, Any idea?

Richard

=====================================================================

[rregan@gn001 Managing-Memory] module load nsight-systems/2020.3.1 [rregan@gn001 Managing-Memory] module list
Currently Loaded Modulefiles:

  1. cuda/10.1 2) nsight-systems/2020.3.1

[rregan@gn001 Managing-Memory] nvcc -o singlethread-vector-add 01-vector-add.cu [rregan@gn001 Managing-Memory] ll
total 2124
-rw-r–r-- 1 rregan dphlss 1953 Jul 30 10:44 01-vector-add.cu
-rw-r–r-- 1 rregan dphlss 98292 Jul 30 11:15 report1.qdrep
-rw-r–r-- 1 rregan dphlss 151552 Jul 30 11:15 report1.sqlite
-rw-r–r-- 1 rregan dphlss 98228 Jul 30 11:17 report2.qdrep
-rw-r–r-- 1 rregan dphlss 147456 Jul 30 11:17 report2.sqlite
-rw-r–r-- 1 rregan dphlss 97803 Jul 30 16:54 report3.qdrep
-rw-r–r-- 1 rregan dphlss 131072 Jul 30 16:54 report3.sqlite
-rw-r–r-- 1 rregan dphlss 98187 Jul 30 17:05 report4.qdrep
-rw-r–r-- 1 rregan dphlss 151552 Jul 30 17:05 report4.sqlite
-rw-r–r-- 1 rregan dphlss 87454 Jul 30 17:50 report5.qdrep
-rw-r–r-- 1 rregan dphlss 122880 Jul 30 17:50 report5.sqlite
-rwxr-xr-x 1 rregan dphlss 639248 Jul 31 10:51 singlethread-vector-add

[rregan@gn001 Managing-Memory]$ nsys profile --stats=true ./singlethread-vector-add
Collecting data…

The target application terminated with signal 11 (SIGSEGV)
Processing events…
Capturing symbol files…
Saving temporary “/tmp/nsys-report-0c63-94a7-cf5c-8c10.qdstrm” file to disk…
Creating final output files…

Processing [==============================================================100%]
Saved report file to “/tmp/nsys-report-0c63-94a7-cf5c-8c10.qdrep”
Exporting 1018 events: [==================================================100%]

Exported successfully to
/tmp/nsys-report-0c63-94a7-cf5c-8c10.sqlite

Generating CUDA API Statistics…
CUDA API Statistics (nanoseconds)

CUDA trace data was not collected.

Generating Operating System Runtime API Statistics…
Operating System Runtime API Statistics (nanoseconds)

Generating NVTX Push-Pop Range Statistics…
NVTX Push-Pop Range Statistics (nanoseconds)

Report file moved to “/cosma/home/rregan/Projects/GPU/DLI/Fundamentals-of-Accerated-Computing/C/Managing-Memory/report6.qdrep”
Report file moved to “/cosma/home/rregan/Projects/GPU/DLI/Fundamentals-of-Accerated-Computing/C/Managing-Memory/report6.sqlite”
[rregan@gn001 Managing-Memory]$

It looks like your code is seg faulting on the host. Does it run correctly without the profiler?