I’m on Opensuse 11.2 and cuda version 3.0. Recently I got a “the launch timed out and was terminated” error when calling kernel function. This only happens when I run the program on large data volume. I keep the number of threads in a block fixed (16x16), so the grid size will increase for large data. for grid size 489x489 there is no error, and if grid size reaches 569x569 the error occurred. A few things I noticed:
– I did not use shared memory. If I use larger data set, I’ll need more global memory and larger grid size. I do not use other type of memory either (constant/texture memory).
– The number of threads in a block does not change, so I don’t think this is caused by not having enough registers (comments?)
– In the scenario when error happens, I only use 40M float point which equals to 40Mb x 4 = 160 Mb GPU memory. This should not be a problem because the GPU memory available is around 500Mb. Also run the code on a tesla cluster with 4G GPU memory, cuda 3.0 and get same error.
– I only hear on Windows, the operating system has a watchdog, and it terminates kernel call if it exceeds ~5 seconds. But I’m on Linux and I don’t think there is such limitation.
– Compile the same code on another Opensuse 10.3 system with cuda version 2.1. Program runs without this error on same data size. That system has 1G GPU memory, but different GPU type.
– this error happens every time if I use larger data. So this is not randomly happened error.
Anyone has insight on this issue? Thanks in advance.