I found that my kernel works correctly when compiled without the -G option, and incorrectly when compiled with it. The kernel has some accesses to shared memory arrays (summations) that must be serialized, since some threads hit the same array location. When compiling with the -G option, these summations won’t work correctly unless I use an atomicAdd(), which makes me think that the arrays have actually been placed in global memory. Am I right? If so, is there a way to prevent that from happening?
This post suggested that -G might be the problem: [topic=“181801”]nvcc compilation using -G[/topic]
I am using a GeForce 480 card on
x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)
Nvidia driver version 256.40
The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run