The 5 seconds of kernel launch time limitation is making cuda development very inconvenient. :wacko:
Though some kind of programs can be divided into small pieces, lots of programs can’t be divided in an easy way!
To make cuda development more difficult simply cause fewer developers to use cuda.
I just can’t understand why a kernel launch MUST be synchronous? Can’t there be an API that start the kernel and return instantly, and another API to poll whether the kernel launch has finished? A user mode polling/wait can be much better than an kernel mode polling/wait used in current cuda.
By the way, most other parts of cuda are very nice.
Thanks a lot