Global memory sometimes misread

Hello everyone,

1 vector is computed from GPU. Results are right. When I try to use this vector in another kernel, some values are misread. If 1 thread misreads a value, all threads from the same block will misread the vector. This vector is located in global memory.
For example, my vector could contain following values :
{0,1,2,3,4,5,6,7,8}
I have 3 blocks in the second kernel. The first block has to read {0,1,2}. The second one has to read {3,4,5} and the last one has to read {6,7,8}.
Sometimes, The block 3 will read {0,0,0} instead of {6,7,8}.
Is there any solution to my problem ?

I’m using a Quadro FX 1700. Driver version is 306.94. I’m using CUDA 5.0 release and JCUDA 0.5.0. My OS is Windows 7 Enterprise. Unfortunately, I can’t give any code.

An isolated code sample that replicates the problem would help for others to give you a better answer.

After debuging, I forgot an absolute value in the comparison. So, the vector after the first kernel was wrong. There is no misread.