nvprof "Error: Internal profiling error 4292:1."

I checked events and metrics supported by GeForce GTX 1080 Ti via “nvprof --query-metrics” and “nvprof --query-events”.

When I run “nvprof --events inst_issued0,inst_issued1,inst_issued2 ./a.out”, I got the following:

==5642== Error: Internal profiling error 4292:1.
======== Warning: 1 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.

When I run “nvprof --metrics ipc ./a.out”, I got the following:

==6784== Error: Internal profiling error 4292:1.
======== Warning: 1 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.

I have no idea where goes wrong with my profiling methods.

Hi,

As the warning message suggests can you please try set the option “–profiling-semaphore-pool-size 100000” on the command line?
$nvprof --profiling-semaphore-pool-size 100000 --events inst_issued0,inst_issued1,inst_issued2 ./a.out

Does nvprof work fine without events and metrics?
$nvprof ./a.out

Can you provide these details:

  • CUDA Toolkit version CUDA Driver version OS

Thanks! My system is Ubuntu16.04.1 X86-64, and the CUDA version is 10.0.130

“nvprof ./a.out” works well, but “nvprof --profiling-semaphore-pool-size 100000 --events inst_issued0,inst_issued1,inst_issued2 ./a.out” still shows the same results:

==30287== Error: Internal profiling error 4292:1.
======== Warning: 1 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.
======== Profiling result:
No events/metrics were profiled.
======== Error: CUDA profiling error.

Hi

I reinstalled cuda10.2 and tried nvprof again. This time, I use vectorAdd sample in cuda samples. I have added cudaProfilerStart() and cudaProfilerStop() before and after the kernel vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements).

When I run “nvprof --events all ./vectorAdd”, it shows:

[Vector addition of 50000 elements]
==2985== NVPROF is profiling process 2985, command: ./vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==2985== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Failed to launch vectorAdd kernel (error code unknown error)!
==2985== Warning: ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/ERR_NVGPUCTRPERM
==2985== Profiling application: ./vectorAdd
==2985== Profiling result:
No events/metrics were profiled.
==2985== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

nvidia-smi results are as following:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 00000000:01:00.0 On | N/A |
| 0% 41C P8 17W / 280W | 345MiB / 11175MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3710 G /usr/lib/xorg/Xorg 178MiB |
| 0 31499 G compiz 43MiB |
| 0 31822 G …uest-channel-token=10531723061151916186 119MiB |
±----------------------------------------------------------------------------+

I tried methods on https://developer.nvidia.com/ERR_NVGPUCTRPERM, but it doesn’t work.