Timeout on long-running CUDA calls - possible?

jgregor · May 25, 2017, 1:10pm

I am wondering if there is a way of implementing a timeout on a single CUDA call such that if it has not returned after n seconds it can be ‘stopped’ (forced to return/throw), enabling the device to be reset and used for another task?

Obviously just calling cudaDeviceReset() on the main thread after n seconds is a bad idea since it would be pulling resources (e.g. allocated memory) out from under a CUDA-running thread which is using them, leading to memory fault and probable crash.

I can implement a solution where I keep track of time on a particular task between CUDA calls. My question is is it possible to timeout a single call that is taking too long to return, without killing the whole process?

Robert_Crovella · May 25, 2017, 7:20pm

not possible.

I’m not sure why cudaDeviceReset is not an option, but so be it. If your intent is to enable the device to be reset and used for another task, that is exactly what cudaDeviceReset does.

jgregor · May 26, 2017, 7:55am

OK, thanks.

My thought was that doing a device reset while another thread was running on the device would/could cause the process to crash. That seemed to be the case when I tried it.

Robert_Crovella · May 26, 2017, 1:17pm

I’m not sure what sort of activity would cause the CPU process to crash just because the device context becomes invalid. However I would think that if you are doing proper cuda error checking at all times, the sudden loss of the device context would be quickly discovered by any thread, before bad things happen (at the moment I can’t think of what those bad things would be).

I guess I would need to see a counterexample where that is not sufficient/effective.

Topic		Replies	Views
per kernel timeout CUDA Programming and Performance	4	1600	December 11, 2015
Cuda timeout and crash CUDA Programming and Performance	1	905	July 17, 2009
application crash and device memory CUDA Programming and Performance	4	1054	August 17, 2010
How to recover CUDA after the display driver has crashed and recovered(caused by cuda crash)? CUDA Programming and Performance	7	1520	October 23, 2014
CUDA Timeout? CUDA Programming and Performance	7	27687	December 19, 2011
cudaErrorLaunchTimeout error - how to repair after it happens ? CUDA Programming and Performance	1	1505	November 21, 2010
Inexpiable CUDA hang (NOT WDM timeout!) CUDA Programming and Performance	2	1476	June 5, 2014
Kernel Interruption in Command Line Application CUDA Programming and Performance	1	7373	July 15, 2011
WDDM Timeout Detection and Recovery CUDA Programming and Performance	1	639	December 12, 2013
Rest crashed CUDA card CUDA Programming and Performance	5	6771	February 23, 2009

Timeout on long-running CUDA calls - possible?

Related topics