I checked events and metrics supported by GeForce GTX 1080 Ti via “nvprof --query-metrics” and “nvprof --query-events”.
When I run “nvprof --events inst_issued0,inst_issued1,inst_issued2 ./a.out”, I got the following:
==5642== Error: Internal profiling error 4292:1.
======== Warning: 1 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.
When I run “nvprof --metrics ipc ./a.out”, I got the following:
==6784== Error: Internal profiling error 4292:1.
======== Warning: 1 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.
I have no idea where goes wrong with my profiling methods.
As the warning message suggests can you please try set the option “–profiling-semaphore-pool-size 100000” on the command line?
$nvprof --profiling-semaphore-pool-size 100000 --events inst_issued0,inst_issued1,inst_issued2 ./a.out
Does nvprof work fine without events and metrics?
$nvprof ./a.out
Thanks! My system is Ubuntu16.04.1 X86-64, and the CUDA version is 10.0.130
“nvprof ./a.out” works well, but “nvprof --profiling-semaphore-pool-size 100000 --events inst_issued0,inst_issued1,inst_issued2 ./a.out” still shows the same results:
==30287== Error: Internal profiling error 4292:1.
======== Warning: 1 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.
I reinstalled cuda10.2 and tried nvprof again. This time, I use vectorAdd sample in cuda samples. I have added cudaProfilerStart() and cudaProfilerStop() before and after the kernel vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements).
When I run “nvprof --events all ./vectorAdd”, it shows:
[Vector addition of 50000 elements]
==2985== NVPROF is profiling process 2985, command: ./vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==2985== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Failed to launch vectorAdd kernel (error code unknown error)!
==2985== Warning: ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/ERR_NVGPUCTRPERM
==2985== Profiling application: ./vectorAdd
==2985== Profiling result:
No events/metrics were profiled.
==2985== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.