cudaDeviceSynchronize not returning error of type "invalid configuration argument"

Hi everybody,

I have always assumed that cudaDeviceSynchronize after a kernel call would return an error if the kernel had issues to run.

It turns out that if the kernel cannot run because of an error of type “invalid configuration argument”, cudaDeviceSynchronize will not return an error, but cudaGetLastError will.

Do we always have to call both cudaDeviceSynchronize and cudaGetLastError to make sure everything is alright? Any other check needed?

Thanks,
Olivier

Yes, to be complete, both checks are needed. cudaDeviceSynchronize() isn’t really needed per se, but a kernel executes asynchronously, so sometime after the kernel launch, it may fail. At that point, the error is detectable either via cudaGetLastError or via another API call that returns errors from previous asynchronous activity.

Therefore if you wait long enough after a kernel call, a single call to cudaGetLastError will return either the synchronous error (invalid configuration argument) or any asynchronous error (e.g. unspecified launch failure).

But if you are debugging code, and for clarity, it’s usually crisper to do both types of error checking, right after the kernel launch.

You can collapse this to a single call as well:

kernel<<<…>>>(…);
cudaDeviceSynchronize();
cudaGetLastError(); <-this will capture either type of error.

No other checks should be needed besides those.

Thanks for this thorough explanation.