"first run" of the cuda program isn't correct

“first run” of the cuda program isn’t correct for some value

After I built my cuda program, only of a few program output isn’t correct at first run,
But run it after second time , it would be all correct until i restart the computer again.
Anyone knows the reason?

You need to remember that the contents of memory aren’t flushed at any time. So those correct results might be just some old results of previous computation, which just happen to be correct. So for debuging purposes add a kernel that writes absurd values to memory and then later run your kernel. That should verify whether it is the case.

In my (recent) experience, inconsistent results are usually the result of using uninitialized memory. Make sure that you initialize all device memory that you read from and that you put __syncthreads() in front of any reads from shared memory written to by other threads.