I am wondering if there is a way of implementing a timeout on a single CUDA call such that if it has not returned after n seconds it can be ‘stopped’ (forced to return/throw), enabling the device to be reset and used for another task?
Obviously just calling cudaDeviceReset() on the main thread after n seconds is a bad idea since it would be pulling resources (e.g. allocated memory) out from under a CUDA-running thread which is using them, leading to memory fault and probable crash.
I can implement a solution where I keep track of time on a particular task between CUDA calls. My question is is it possible to timeout a single call that is taking too long to return, without killing the whole process?