I’ve been getting invalid results from my kernel when I run it normally, but when I run it with Parallel nSight, it works perfectly. Nothing has changed between the two runs other than how it’s executed. I’m not using any shared memory. Any ideas?
Only other thing I could imagine is a race condition involving global memory. Is there any possibility for one thread to write to a memory location that another thread is reading?