Hello everyone,
1 vector is computed from GPU. Results are right. When I try to use this vector in another kernel, some values are misread. If 1 thread misreads a value, all threads from the same block will misread the vector. This vector is located in global memory.
For example, my vector could contain following values :
{0,1,2,3,4,5,6,7,8}
I have 3 blocks in the second kernel. The first block has to read {0,1,2}. The second one has to read {3,4,5} and the last one has to read {6,7,8}.
Sometimes, The block 3 will read {0,0,0} instead of {6,7,8}.
Is there any solution to my problem ?
I’m using a Quadro FX 1700. Driver version is 306.94. I’m using CUDA 5.0 release and JCUDA 0.5.0. My OS is Windows 7 Enterprise. Unfortunately, I can’t give any code.