seg fault in kernel when attempting to += global memory

Hi, I am running into the strangest thing.

In my kernel, I attempt to update some global memory in a loop, like this:

// float * gmem is an input array to my kernel.

for (..) {

	gmem[ n ] += float_val;


and I get a segmentation fault. As far as I can tell, the += increment is the problem. If I replace with a = , the kernel behaves as expected, and no seg faults. Obviously, though, I need to incremental sum behavior, so I can’t just get rid of the +=. What could be the problem?

// this runs with no seg faults

for (..) {

	gmem[ n ] = float_val;


Any ideas? Please help! Windows XP, CUDA 2.3, GPU CUDA capability 1.3

just to follow up, problem solved. of course, it has “nothing” to do with the += … the difference made my kernel run over 5 seconds, at which point i run into the windows timeout limit, since this GPU is also my graphics adapter. i didn’t realize this timeout would be reported as a seg fault, so i really had no clue what the problem was. now i am aware.