I have a question… I was trying to run multiple kernels at the same time. but later on i found out that I cant call the new kernel until the previous kernel is executed and becomes free.
I think you can use streams (see chapter 3.2.6 / 3.3.9 in CUDA Programming Guide V2.2) to asynchronously start kernels. If you already do so, maybe your kernels use too much shared memory?
In short, no. The stream interface was specifically designed to allow multiple kernels to run at the same time, but current hardware doesn’t support this.
(Interestingly, the hardware does run vertex and pixel shaders at the same time in graphics mode, but that’s another story).
Really? :) Oh…in the guide it always sounds as if kernels started on different streams would already run parallel. But the memory copying functions suffixed with Async do work parallel, right? Mh, and another maybe stupid question…since it’s called “Driver API”, are the graphic card drivers also written with this api? Last time we asked ourself if some game developers one day would come up with the idea to write their own specialized drivers (I mean the whole graphic engine) with cuda…would that be realistic, is it going to happen someday?
(btw: As an Nvidia employee, you must an expert…could you please take a look at the thread I started last night (same forum )? ^^)