Just noticed Nsight Systems mentioned in CUDA 10 release announcement (I can see now that it started to get mentioned couple months ago, but somehow missed it). It seems like extremely useful tool, so I wanted to try it on CUDA code that I’m developing. I’m still with CUDA 9.2, also using Quadro P3000 mobile card, on 64-bit Linux system, so it seemed it should work, thus I installed stand-alone version of Nsight Systems, version 2018.1.1.36. However:
I was not able to find any kind of tutorial on how to run the tool. The user guide is rather dull, and elsewhere on the the net there is just a short article on Developer Blog, and this video on YouTube on using it to accelerate VMD code. But no instructions on how to prepare the code, are there any special compile flags to use etc., and then how to run it using Nsight Systems. It would be very useful to have a tutorial with a simple example, say from CUDA samples, just how to run the program and interpret results.
In any case, I tried to guess how to use the program, and tried to run my CUDA code, and then also conjudateGradient program from cudaSamples. I select localhost connection, then sampling target process, specify command line and working directory, and then also check “Collect CUDA trace”. However, when profiling started, in the messages it says “CUDA injection initialization failed.” and it doesn’t show any of CUDA traces. On the other side, from /tmp/nvidia/nsight_system/streams/pid_*_stdout.log, it seems that in both cases programs did run, and that they properly used CUDA (btw, it would be good to have process exit code save along the standard output/error too - and even better it would be to have standard output/error shown within Nsight Systems GUI). So - what could be the problem here, and how to fix it?
Minor problem is that Nsight Systems starts with an extremely large window. My laptop has rather large screen resolution (3840x2160), but still Nsight Systems window starts even larger.