I had a problem running thrust while the -G flag was set, even though some features would work just fine.
When I execute in debug mode without -G flag the GPU code does run faster, but upon close examination I found that in this mode the non-thrust reduction/scans kernels result in different answers.
I did look at the CUDA Compiler Driver nvcc documentation;
but that did not explain this issue.
It is almost like it is ignoring the __syncthreads() commands within the kernels without the -G flag.
How specifically does this flag affect the code generation and execution?
This statement suggests it is a bit more complicated…
Generate debug information for device code, plus also specify the optimization
level for the device code in order to control its 'debuggability.
Allowed values for this option: 0,1,2,3.