I’m working on a program which has long runs and calls a CUDA kernel many times. I, like a couple other people who have posted, get occasional (and randomly occurring) “unspecified launch failure” errors. Based upon those previous posts, I’m thinking it’s due to the fault tolerances of my specific GPU.
Rather than kill the whole process, I’d like to be able to clear the error after it occurs, wait a few seconds, and try again. However, any subsequent error check fails due to the first, and none of the documentation or forum discussion I’ve found discusses clearing CUDA errors. Is there a way to clear CUDA errors?
Other options include:
(1) starting a separate process which calls the CUDA kernel, and check the return code.
(2) check the returned data directly
neither of which I find ideal due to the computational overhead.
Any thoughts? Thanks…