I’m getting some random garbled bits in a reasonably simple 3-step computation using CUDA. Each run gives me different garbage. Sometimes things are 100% OK, but the errors seem to be CPU-load dependent.
First, I pass an array of structures in, and calculate some temporary floating point array output. There is some accuracy-related differences between the reference version and the new CUDA version, but everything is OK in general.
So, step 1: ~100 int’s to ~300 floats -> OK.
Then, the float output is taken, more calculations are performed, and this is bit-perfect.
Step 2: ~300 floats to ~300 ints -> Perfect.
Now comes the interesting part. The results from step 2 are used in a kind of hashing operation, so I use some basic 64-bit integer math.
~300 ints -> ~30 ints -> BAD.
This works perfectly in a C++ reference implementation, and it works 100% in emulation mode, but running it on the CUDA device gives me super weird output data. Sometimes the output is good compared to the reference data, sometimes it is slightly garbled, and sometimes it’s 100% bad. Sometimes, 100 runs in a row will give me perfect output, and other times, not even two runs in a row will be OK. Exactly the same input data, different output, but only in step 3. Also, the values from step 3 sometimes seem to be broken in the same way - the same bits seem to be flipped or garbled.
Does anyone have any idea what is going on?
I can’t see any race conditions.
The card is not overclocked.
I’m running 64 bit Ubuntu Linux, and I’ve tried on both CUDA 1.1 and CUDA 2.0, using the correct drivers for each, on two separate machines with two different cards.
The card is not running too hot - I’ve got the fan on 100%, and the chip is at less than 50 deg. C.
Thanks for your help!