Multiple sequential kernels, second not running


I am trying to write a small image analysis program which makes use of two kernels. They DO NOT run at the same time.

Basically, the idea is that the first one is called and it does it’s operations on the set of images, and it’s out put is then returned to the calling function and passed to the second kernel which carries out it’s operations.

However, when i run the program, the first kernel runs fine and as expected. The second kernel however doesn’t do anything.

Well, it enters, but when I check the ThreadIdx and BlockIdx so that it’s identifier still sit inside the image, they consistently don’t.

So, in short, my question is: Do the ThreadIDx and BlockIdx get conserved across kernels, like if I have 8 blocks on the first kernel, will the Index of hte second kernel start at 9? (I don’t check for this and assume indexing is returned to zero for each kernel).

If I put code in to change my array, but with no bounds checking, this code is carried out, but causes problems as it’s outside the bounds of the image sometimes!


No, IDs get “reset” at kernel launch.
A possible reason for wrong IDs could be accessing shared memory out of bounds. A part of it is used by CUDA for storing the IDs etc so if you mess this up the result could be wrong IDs. Can you post some code?