Could cudaprof improve the speedup?


I have detected that my timing results are much better when I get them with cudaprof opened. How is that possible?
I have studied the problem and I have detected that is due to memory transferences and the same happens with other CUDA programs.
Is cudaprof accelerating memory transferences?

Thank you.