Nvprof internal profiling error with certain metrics

I am having trouble getting metric information with nvprof. Many metrics give messages of the type “Error: Internal profiling error…”

It is not clear why certain metrics fail, as they appear in the list of available metrics when I run ````nvprof --query-metrics```. (Note: For the same command, I get this error on our system with V100s but not on the system with P100s. Also, additional metrics seem to fail with cuda v11 compared to cuda v10, i.e. branch_efficiency).

Here is more information about the system that gives the error. Driver version 510.47.03, tried with cuda v10.02.89 and cuda v11.6.1.

I am testing with the vectorAdd sample code.

Here is the output:

simpleAtomicIntrinsics starting...
==158454== NVPROF is profiling process 158454, command: /apps/cuda/10.2.89/samples/0_Simple/simpleAtomicIntrinsics/simpleAtomicIntrinsics
GPU Device 0: "Volta" with compute capability 7.0

==158454== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel "testKernel(int*)" (3 of 3)... 
Replaying kernel "testKernel(int*)" (done)
simpleAtomicIntrinsics completed, returned OK
==158454== Error: Internal profiling error 4107:7.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.

What is causing these issues or are there some suggested steps for debugging them?