What does the compiler nvcc do when a cuda code is compiled with the option -G ?
I'm getting an exorbitant speedup when this flag is used. I understand that this should be turned on to be used with cuda-gdb.
I have a lot of templated device functions, operator overloading etc...
You should see the opposite - -G spills registers to local memory and generally makes things considerably slower. You might want to check that the kernels are actually launching at all.
You should see the opposite - -G spills registers to local memory and generally makes things considerably slower. You might want to check that the kernels are actually launching at all.