copy memory after a kernel crash

While it’s typically useful for the CUDA memcpy functions to return a failure code after a kernel has been launched, when debugging, it could be useful to copy out memory after the kernel failure to inspect it. Is there any API functionality to allow this? Currently I copy all memory to the host, but this is quite slow.

thanks,
Nicholas