Unstable results?

Hello,

I am slowly gaining experience with CUDA but have come across a very puzzling, race-like condition.

The behavior is as follows:

  1. Results are correctly processed during debugging when using printf statements to read results manually and compare them to known values from a cpu simulation.

  2. Results are incorrect (unstable) when processed when printf statements are removed.

Observations / Notes:

A. The unstable results at felt like a race condition however when limiting kernels to <<<1,1>>> the unstable behavior persists.

B. cudaDeviceSynchronize() statements placed after kernels seem to have no effect.

C. Each kernel is using a lot of memory relative to previous kernels I have written, for example kernels use arrays (float array[105]) and one kernel uses three of those. That said each kernel appears to launch and run successfully.

Question:

What might the unstable behavior be caused by?

possibly the use of uninitialized data

cuda-memcheck tool has some subtools that may be of interest. take a look at the cuda-memcheck documentation