visual studio performance profiler on CUDA code

HI

I have a dual 2 core and I read that you can’t measure total number of clock cycles for code using this cpu and visual studio.

I am moving a current application to provide CUDA support (school project)

However, I can measure the number of instruction executed using the visual studio’s performance profiler.

I wonder if this could really be an indication of performance on cuda code. Correct me if I am wrong, but since GPU is highly parallel the fact that we have the same number of instruction executed in two code samples (one using CUDA, one using pure CPU) does NOT mean that both codes with complete at mostly the same speed.

Am I wrong?

Does any one know a good profiler that I can use on windows that works with CUDA where I can measure speed??..or how can I profile on visual to get a real sense of the improvement in performance.

Thanks

Yes, since CUDA calls are asynchronous, using a CPU profiler only give you an indication of how much time the kernel launch is taking on the CPU, not the total time for execution.

To get an idea of how much time your GPU kernels are taking you should use the CUDA visual profiler:
http://forums.nvidia.com/index.php?showtopic=57443