synchronous kernel calls?

Hello,

I was wondering if there is a way to execute kernel calls synchronously, so I might safely call kernels like this:

void function() {

    while(i<N) {

        kernel1<<<>>>();

        kernel2<<<>>>();

    }

}

Kind Regards

edit: cudaThreadSynchronize() or cudaGetLastError() do not seem to work for me…

Kernel2 will not start unless kernel1 is completed, if they are in the same stream.

Ok, then there must be some other reason for my problem. If I run the function as above, parts of the matrix I am calculating are NaN. If I outcomment one of the kernels, or set N to a small number, the results are fine. I do not use shared memory and there is not one division inside one of the kernels so I am out of my wits what might even cause NaN?

It might well be the case that the Nan values are coming from uninitialised memory. Either because you code is reading out of bounds, or because the code isn’t actually running to completion, leaving some of the output memory untouched.

check your errors.

The initial matrix contained some very small floating numbers. I’m not entirely sure, how this might have caused the problems; anyway, I scaled them up a bit and the program works, so I am happy for now :-) Thanks for the feedback!