Hi, we’re attempting to profile some kernels we have to improve the performance a bit. In the past, I’ve spent quite a bit of time invested in nvprof/nvvp, and have watched nsight tools progress over the years passively. Since our workflow is all containers in Kubernetes, the only way we can really visually profile is to dump the output using the CLI tools, and import them in the graphical tools on another machine.
I’m not sure when this started happening, but I did the normal profiling using nvprof we do with :
__PREFETCH=off nvprof --analysis-metrics -f -o output.prof cmd
This command now gives a SIGPIPE when running, and never completes a profiling session successfully. Given that I know nvprof/nvvp are being deprecated in the future, I started playing with nsight-systems and nsight-compute. I began by using “nsys profile” with:
nsys profile --stats=true cmd
This completes successfully and drops a qdrep file that I can load in nsys-ui. Once this file is loaded and I click to analyze the kernel, it asks for where ncu-is located, so I pointed it to the place with the binary, and it says:
They are both the most recent versions (2020.1.2 compute, and 2020.3.4 systems), so I’d think the integration should be supported. Next, I tried using nsight-compute command line directly along with the UI to load the file:
ncu -o profile --metrics "regex:.*” cmd
It drops a ncu-rep file, which I then load into ncu-ui. However, that shows nothing profiled on the Details view:
This appears to be a bug someone reported last year, and they said on the forums it was fixed:
However, I’m running the latest and it doesn’t appear to work. I also tried a subset of the metrics, and that too didn’t show anything in the Details view. Next, I tried simply looking at what metrics were available on this GPU, and the list seems far too short for a V100:
root@fi-gcomp016:/# /opt/nvidia/nsight-compute/2020.1.2/ncu --list-metrics
Even running a simple example gives the same results on the input, and even fails ncu with an internal error:
root@fi-gcomp016:/nfs/samples/7_CUDALibraries/simpleCUBLAS# /opt/nvidia/nsight-compute/2020.1.2/ncu -o test --set full ./simpleCUBLAS
==PROF== Connected to process 1382 (/nfs/samples/7_CUDALibraries/simpleCUBLAS/simpleCUBLAS)
GPU Device 0: “Volta” with compute capability 7.0
simpleCUBLAS test running…
==PROF== Profiling “volta_sgemm_32x32_sliced1x4_nn” - 1: 0%…50%…100% - 3 passes
==ERROR== Error: InternalError
simpleCUBLAS test passed.
==PROF== Disconnected from process 1382
==ERROR== An error occurred while trying to profile.
==PROF== Report: /nfs/samples/7_CUDALibraries/simpleCUBLAS/test.ncu-rep
Any idea what’s wrong here?