I know Release mode compilation improve GPU code quite much.
When I used Release mode compilation, my application shows performance improvement in kernel function and in data transfer time.
I can’t understand how release mode compilation can improve “data transfer time” which seems solely depending on hardware (DMA).
Anyone has idea behind this ?
I don’t see any evidence of this. I took the cuda 6 sample code bandwidthTest, and compiled it with and without the -G switch on linux. The results were the same. I tested both --memory=pinned and --memory=pageable options and there was no significant difference between the utility compiled with -G and without -G. If you are testing on a windows WDDM GPU, it can be difficult to get understandable, predictable, and repeatable benchmark results due to the WDDM system.
Thanks txbob for your reply.
I am also sure that code optimization does not improve data transfer time, but I have observed it is improved in my application.
Hm… something strange … need to see more detail.