Nvidia GPU - OpenCL Profiling


I am using OpenCL on an Nvidia GTX-970 (Linux-Ubuntu).

I want to profile my OpenCL kernels for metrics like cache miss rates, SIMD Utilization, branch divergence etc. I have looked up online, but couldn not find anything for this case. AMD has its own APP Profiler from which one can get these stats, but i could not find something similar for an OpenCL kernel running on an Nvidia GPU.

Any ideas?


See this link:


it’s technically a kluge but it works quite well. If you want to see multiple devices in the same file just merge all of the cuda logs at the end. See my repo here for how to do it for multiple devices: