Calculation stalls when using more than one block

Hi! I have a really stubborn problem and I’ve been trying to fix it for weeks. Finally I have managed to narrow it down and think it is related to the number of blocks but the fight continues! External Image

In the host code there is a loop and the kernels are executed inside it.(Maybe a bad idea to begin with??) There are also some cudaMemcpy()'s inside. When NUM_BLOCK == 1 the code is working fine. When I change NUM_BLOCKS to anything higher, say 2, it stalls inside Kernel2 or directly after.

I think it has to with synhronization but I have no idea what else to do except from using cudaThreadSynchronize() after the kernels and __syncthreads() inside when ever possible. Maybe it has to do with something completely different.

Please give me your thoughts about it. Anything is appreciated!

I use a GTX 260 and Visual Studio 2008 Pro with 64-bit Vista.

//Host code
.
.
.
for(int k = 0; k < 10; k++){
cutilSafeCall( cudaMemcpy(…) );
cudaThreadSynchronize();
cutilCheckMsg(“Execution failed\n”);

Kernel1<<<NUM_BLOCKS, 32>>>(…);
cudaThreadSynchronize();
cutilCheckMsg(“Execution failed\n”);

cutilSafeCall( cudaMemcpy(…) );
cudaThreadSynchronize();
cutilCheckMsg(“Execution failed\n”);

Kernel2<<<NUM_BLOCKS, 32>>>(…);
cudaThreadSynchronize();
cutilCheckMsg(“Execution failed\n”);

cutilSafeCall( cudaMemcpy(…) );
cudaThreadSynchronize();
cutilCheckMsg(“Execution failed\n”);
}
.
.
.

I don’t think anyone is going to be able to help you without some indicative kernel code. You probably have a race or synchronisation problem, but it is a bit unrealistic to expect a diagnosis without the code in question…

I don’t think anyone is going to be able to help you without some indicative kernel code. You probably have a race or synchronisation problem, but it is a bit unrealistic to expect a diagnosis without the code in question…

THANK YOU!! You made my week!

Sorry for not posting the kernel code but a thought the problem was in the host code. I went looking in the kernel code for some race condition after reading your post and I have already found the one causing troubles. I have been searching for the error at the wrong places. Should have ask this question long ago!!

Thank you again!

THANK YOU!! You made my week!

Sorry for not posting the kernel code but a thought the problem was in the host code. I went looking in the kernel code for some race condition after reading your post and I have already found the one causing troubles. I have been searching for the error at the wrong places. Should have ask this question long ago!!

Thank you again!