Nondeterministic Behavior Different results for same program...

!!! HELP !!!
I am trying to implement a Lattice Boltzmann Method to model fluids using CUDA. The algorithm seems correct (it’s been check multiple times by multiple people). I have a very strange and so far unexplained behavior though.
Two sequential executions of the SAME program provide different results !!! I have played around with the code, trying everything I could think of but I still get different results.

The source codes are attached. LBD2Q92.cu is the main source that take care of all the OpenGL.
The problem resides somewhere in LBD2Q92_kernel.cu. If executed, it is normal that the opened window stays black, the terminal should print a series of numbers until you quit. These numbers are the problem. If you execute the code sequentially, you’ll see that the printed results are different when they should be the same…

The problem in somewhere between lines 174 and 177 of the kernel. If you comment these lines: no problem. If you uncomment them one by one, you’ll see that at some points the results will change. However many times the program is executed, not twice are the same results printed.

Really don’t know what to do else or where to look. ANY HELP MUCH APPRECIATED!!![attachment=6360:attachment]
LBD2Q92.zip (5.35 KB)