"the launch timed out and was terminated" error on cuda v3.0

Hello everyone.

I’m on Opensuse 11.2 and cuda version 3.0. Recently I got a “the launch timed out and was terminated” error when calling kernel function. This only happens when I run the program on large data volume. I keep the number of threads in a block fixed (16x16), so the grid size will increase for large data. for grid size 489x489 there is no error, and if grid size reaches 569x569 the error occurred. A few things I noticed:

– I did not use shared memory. If I use larger data set, I’ll need more global memory and larger grid size. I do not use other type of memory either (constant/texture memory).
– The number of threads in a block does not change, so I don’t think this is caused by not having enough registers (comments?)
– In the scenario when error happens, I only use 40M float point which equals to 40Mb x 4 = 160 Mb GPU memory. This should not be a problem because the GPU memory available is around 500Mb. Also run the code on a tesla cluster with 4G GPU memory, cuda 3.0 and get same error.
– I only hear on Windows, the operating system has a watchdog, and it terminates kernel call if it exceeds ~5 seconds. But I’m on Linux and I don’t think there is such limitation.
– Compile the same code on another Opensuse 10.3 system with cuda version 2.1. Program runs without this error on same data size. That system has 1G GPU memory, but different GPU type.
– this error happens every time if I use larger data. So this is not randomly happened error.

Anyone has insight on this issue? Thanks in advance.

If you’re using a GPU for display and running X, there is a 5 second timeout on Linux.

Thanks tmurray for the quick reply. I forgot to say the tesla cluster node is not running X session. I got same error on the tesla cluster node, and I can not explain it with the reason you mentioned.

Besides, on the opensuse 10.3 and cuda 2.1 system, there is X windows running, but I did successfully run the code without the error. And I estimated the time of calling kernel may well exceed 5 seconds.

Then I would guess the difference between v3.0 and v2.1 is the reason, though it sounds impossible.

EDIT: I just installed cuda v2.1 on tesla node (it’s on Red Hat enterprise 5.5) but still get the same error. To give a summary:

— A desktop named ‘graphor’ with nvidia GT200 (GPU memory 1Gb) and opensuse 10.3, cuda version 2.1. Run program without error.

— A node of Nvidia tesla cluster (GPU memory 4Gb) with Red Hat enterprise 5.5 and cuda version 2.1. Run program and got the error.

Is that because the different OS’s behavior? Really having no idea…

Just update my cuda graphics card driver to 256.40 and toolkit to v3.1, but still get same error…

Is there any way of increasing this or using the onboard graphics for X and the graphics card for computing?