analysis inside kernel

yaobin · June 7, 2012, 3:16pm

I am new for CUDA, and I want to ask some basic questions:

a) I see some topics about performance comparison with CPU, and some analysis of the performance. Is there any tool or means that I can analysis or show the consuming time distribution inside a kernel running? For example, can I show the time for the thread to copy the data from global memory to shared memory?

b) I am focused on performance optimization, can anyone show me any source code about image processing? expecially comparing on CPU and GPU. Any help or suggestions will be appreciate.

Thanks a lot

cmaster.matso · June 12, 2012, 9:49am

a) Tried NVIDIA Compute Visual Profiler?

yaobin · July 2, 2012, 12:23am

Thanks for reply, now we use the profiler and can analysis