I am having trouble getting metric information with nvprof. Many metrics give messages of the type “Error: Internal profiling error…”
It is not clear why certain metrics fail, as they appear in the list of available metrics when I run ````nvprof --query-metrics```. (Note: For the same command, I get this error on our system with V100s but not on the system with P100s. Also, additional metrics seem to fail with cuda v11 compared to cuda v10, i.e. branch_efficiency).
Here is more information about the system that gives the error. Driver version 510.47.03, tried with cuda v10.02.89 and cuda v11.6.1.
I am testing with the vectorAdd sample code.
Here is the output:
simpleAtomicIntrinsics starting... ==158454== NVPROF is profiling process 158454, command: /apps/cuda/10.2.89/samples/0_Simple/simpleAtomicIntrinsics/simpleAtomicIntrinsics GPU Device 0: "Volta" with compute capability 7.0 ==158454== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics. Replaying kernel "testKernel(int*)" (3 of 3)... Replaying kernel "testKernel(int*)" (done) simpleAtomicIntrinsics completed, returned OK ==158454== Error: Internal profiling error 4107:7. ======== Profiling result: No events/metrics were profiled. ======== Error: CUDA profiling error.
What is causing these issues or are there some suggested steps for debugging them?