evaluate openCL performance on Nvidia GPU before porting to CUDA


I had some openCL code run on several Nvidia GPUs. As the energy consumption was less than expected I guess the programm does not push the hardware hard enough.

I would like to evaluate the openCL implementation before porting it to CUDA.
Is there a possibility to get information regarding used memory bandwidth, number active blocks and threads.

Is there a way of analysing the core in regards of optimizations?
How about after porting it to CUDA?

Best regards,

Have a look at this to monitor your OpenCL kernels

It does claim to have some OpenCL support, but I do not know if this is anywhere
as good as the profiling and debugging support for CUDA.


thanks. Had in mind Nsight would not do OpenCL evaluation. Seems I was wrong.

The visual studio version expects to be run on windows, right? And the eclipse version doesn’t mention OpenCL evaluation.

I would prefer not to migrate to windows. So I am in need of a evaluation/debugging solution for ubuntu.

Best regards,