Multiple runs, different results

Hi. I have this strange problem where every once in a while I get the wrong answer from my CUDA code, but then if I run it again with the exact same inputs (in the same process, not restarting the program) I get the right answer. It’s not really appropriate for me to post the exact code here, so I’m not looking for someone to debug the code, I’m just curious if this smells like the result of a common pitfall to anyone. The code isn’t very complicated – binding some 2D and 3D textures, texture lookups via tex2d() and tex3d(), a few loops worth of CUDA kernels, all very serial, single stream. 3D textures are linear/clamp, 2D textures are point/clamp. 99.9% of the time I get the correct answer and the inputs are all very deterministic, so I’m fairly confident the logic is correct, but I believe something is causing a texture fetch to yield the wrong value so I’m thinking there is some architectural consideration I have not addressed. I’ve tried putting cudaDeviceSynchronize() after every CUDA call, but the same thing happens. I don’t think I’m doing anything particularly weird in my code, but I do a lot of texture bind/unbind in a loop, which may have some pitfalls? Any suggestions would be appreciated.
Note: Due to system limitations, I’m using the rather elderly 4.2 toolkit, running on a Quadro K2000 with a pretty recent driver (348.07). Windows 7 64-bit…

Thanks!

“Hi. I have this strange problem where every once in a while I get the wrong answer from my CUDA code, but then if I run it again with the exact same inputs (in the same process, not restarting the program) I get the right answer.”

i have noticed this many times, particularly when debugging
without restarting, the latter runs/ iterations have memory locations ‘initialized’ to different values, compared to the very first run, thereby throwing the execution path
the 1st run has nothing preceding it, so it can not happen to itself; but, the first run writes to memory, that then affects or ‘helps on’ subsequent runs
this in turn likely spells a race condition - you generally need a race to amplify different memory values into different execution paths
it can be memory corruption too

depending on the state of your program/ point of debugging, a racecheck and memory check might save you

A race condition is a good thought. I was hoping someone would say, “Oh yeah, you need to turn this flag on” ;) Sounds like I’ll just have to dig in and trace everything thoroughly. Thanks!

sometimes it helps to study the incident as much as possible, to learn as much as possible about it, in turn to help you better pinpoint its source

for example, if it occurs every time after the initial run, it is likely more of a definite race
on the other hand, if it only occurs with a certain input, or with a repetitive interval, etc, it may be more of a ‘conditional’ race - a race only triggered when a certain input condition is satisfied