cudaMemcpyAsync returns error cudaErrorLaunchFailure


cudaMemcpyAsync returns error cudaErrorLaunchFailure.

How to look into it?



It means a kernel issued prior to the issuance of the cudaMemcpyAsync call failed. Use proper CUDA error checking (just google that) to identify the kernel. If need be, place a cudaDeviceSynchronize() call after each kernel call (perhaps use a debug macro to accomplish this). That, combined with the proper CUDA error checking, should localize the fault to the specific kernel that caused it. You could then use a method such as described here to continue the debug process.

It’s also probably that you could short-circuit some of this simply by running cuda-memcheck/compute-sanitizer directly.

The possibility for this sort of error reporting behavior is indicated in a number of places, including in the cuda runtime API docs for the function you list specifically:

  • Note that this function may also return error codes from previous, asynchronous launches.