I have a dual 2 core and I read that you can’t measure total number of clock cycles for code using this cpu and visual studio.
I am moving a current application to provide CUDA support (school project)
However, I can measure the number of instruction executed using the visual studio’s performance profiler.
I wonder if this could really be an indication of performance on cuda code. Correct me if I am wrong, but since GPU is highly parallel the fact that we have the same number of instruction executed in two code samples (one using CUDA, one using pure CPU) does NOT mean that both codes with complete at mostly the same speed.
Am I wrong?
Does any one know a good profiler that I can use on windows that works with CUDA where I can measure speed??..or how can I profile on visual to get a real sense of the improvement in performance.