Kernels get killed: CL_OUT_OF_RESOURCES error waiting for idle

stanr · August 22, 2011, 5:18pm

Hi,

Does anyone know how to fix this problem? For certain kinds of (otherwise error-free) kernels that run in a long loop, like this one:

for (int i = 0; i < N; i++) {
// some writes
// memory barrier
// some reads
}

for large enough N and large enough run size, the kernel is killed resulting in CL_INVALID_COMMAND_QUEUE in subsequent calls, and sometimes (!) pfn_notify, the error callback passed to clCreateContext, receives the following message: “CL_OUT_OF_RESOURCES error waiting for idle on GeForce GTX 580.” This also happens when there are many atomic accesses in long loops.

Do you know what causes this? I’m imagining some timer times out because the driver believes the threads are in an infinite loop, and kills the kernel. It is notable that the run size has to be wide enough for this to happen, which means that this is probably caused by an impatient scheduler.

I’m not really asking NVIDIA OpenCL developers to solve the halting problem, but a more reasonable timeout period like at least a few seconds would let the more complicated kernels to run.

This is also an extremely frustrating occurrence: I don’t know how to predict when this will happen, and 19 out of 20 times my code doesn’t even get the pfn_notfy callback, so I effectively have to way of knowing what happened most of the time. Is there a parameter I can set to control this? Does anybody have any insight? Thanks.

P.S. BTW this is running the “OpenCL 1.1 CUDA 4.0.1” drivers

l_woog · September 23, 2011, 8:30pm

I am seeing the same problem with the 280.13 drivers. No such error with the 270.41.19 drivers.

Hi,

Does anyone know how to fix this problem? For certain kinds of (otherwise error-free) kernels that run in a long loop, like this one:

for (int i = 0; i < N; i++) {

// some writes

// memory barrier

// some reads

}

for large enough N and large enough run size, the kernel is killed resulting in CL_INVALID_COMMAND_QUEUE in subsequent calls, and sometimes (!) pfn_notify, the error callback passed to clCreateContext, receives the following message: “CL_OUT_OF_RESOURCES error waiting for idle on GeForce GTX 580.” This also happens when there are many atomic accesses in long loops.

Do you know what causes this? I’m imagining some timer times out because the driver believes the threads are in an infinite loop, and kills the kernel. It is notable that the run size has to be wide enough for this to happen, which means that this is probably caused by an impatient scheduler.

I’m not really asking NVIDIA OpenCL developers to solve the halting problem, but a more reasonable timeout period like at least a few seconds would let the more complicated kernels to run.

This is also an extremely frustrating occurrence: I don’t know how to predict when this will happen, and 19 out of 20 times my code doesn’t even get the pfn_notfy callback, so I effectively have to way of knowing what happened most of the time. Is there a parameter I can set to control this? Does anybody have any insight? Thanks.

P.S. BTW this is running the “OpenCL 1.1 CUDA 4.0.1” drivers

Topic		Replies	Views
CL_OUT_OF_RESOURCES In what situation it can be at ReadBuffer call? CUDA Programming and Performance	5	2788	October 12, 2010
Multiple iteration of single Task Kernel CUDA Programming and Performance	7	865	July 25, 2017
out of resources when clEnqueueReadBuffer CUDA Programming and Performance	5	20448	April 29, 2011
Kernel fails, no errors or explanation Smaller kernel runs fine CUDA Programming and Performance	3	1215	April 13, 2011
kernels timeout or hang intermitently CUDA Programming and Performance	9	3709	July 25, 2013
OpenCL CL_INVALID_COMMAND_QUEUE issue CUDA Programming and Performance	1	1175	July 5, 2017
clWaitForEvents returns CL_OUT_OF_RESOURCES CUDA Programming and Performance	7	4541	February 12, 2018
Kernel execution fails with error CL_OUT_OF_RESOURCES HELP CUDA Programming and Performance	3	11010	January 28, 2010
CL_OUT_OF_RESSOURCES How to get more details? CUDA Programming and Performance	1	3912	January 25, 2010
__constant memory problems with 480 GTX CUDA Programming and Performance	1	667	November 10, 2010

Kernels get killed: CL_OUT_OF_RESOURCES error waiting for idle

Related topics