Incorrect results when normal compile, correct with -g -G


I have some code that is giving me the correct answer when I compile with -g -G but when I take those debug flags out, I am getting the incorrect answer.

I looked through my kernel,and made sure to initialize every variable, and I had a couple of math functions, abs() and pow() that I thought could be giving me an issue, but I replace the abs with fabasf() as is mentioned in the documentation, and made the pow into a multiplication (since i was just squaring).

What else could be the cause of this? I thought it may be arrays out of bounds, but it seems like I’d get hte same problem in the debug code, and do have some code that I think is correct to keep from going out of array bounds anyway.

Any other ideas?


WIthout more details, we just have to guess. I’d say that a race condition is most likely, since the debug flags can often change orderings enough to make a race “work”.

Probably a shared memory problem somewhere. Compiling for device debugging spills everything to local memory, which can make this work that might otherwise fail.

Thanks for the replies so far. I am trying to figure out if I can build a small sample program that would give you all an example. In the mean time, some more details - all of the data is coming in is just in global memory, I am not using any shared memory or anything like that.

See my other post about register counts - it was causing some conflicts because my kernel had too many register variables.


I am having the same problem. I reduced maxregcount to 24 with no luck. I put __syncthreads() everywhere, so I can’t image a race condition. Still, wrong results. Only with optimizations on!

What’s more, I can’t use any atomics on shared memory with optimization on, no matter what. Even tried making the shared ptr volatile, which isn’t neccessary, but it wouldn’t compile because apparently atomics can’t be used with volatile ptrs.

I’m at my wit’s end here. Are there known problems with compute2.0 optimizations? I’m in 64 bit mode.

More detail…

On the atomics, it compile file without volatile shared memory ptr. It does crash and usually completely freeze my ubuntu64 system, requiring a restart. no problems at all and perfect results with -g -G.