I am having trouble getting metric information with nvprof. Many metrics give messages of the type “Error: Internal profiling error…”
It is not clear why certain metrics fail, as they appear in the list of available metrics when I run ````nvprof --query-metrics```. (Note: For the same command, I get this error on our system with V100s but not on the system with P100s. Also, additional metrics seem to fail with cuda v11 compared to cuda v10, i.e. branch_efficiency).
Here is more information about the system that gives the error. Driver version 510.47.03, tried with cuda v10.02.89 and cuda v11.6.1.
I am testing with the vectorAdd sample code.
Here is the output:
simpleAtomicIntrinsics starting...
==158454== NVPROF is profiling process 158454, command: /apps/cuda/10.2.89/samples/0_Simple/simpleAtomicIntrinsics/simpleAtomicIntrinsics
GPU Device 0: "Volta" with compute capability 7.0
==158454== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel "testKernel(int*)" (3 of 3)...
Replaying kernel "testKernel(int*)" (done)
simpleAtomicIntrinsics completed, returned OK
==158454== Error: Internal profiling error 4107:7.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.
What is causing these issues or are there some suggested steps for debugging them?