Multiple simultaneous kernels across different streams

Hi

The programming guide clearly states that all threads in a block have to execute the same code (not necessarily simultaneously). Is it possible to run multiple kernels across multiprocessors, one in each multiprocessor, simultaneously? The document was not very specific regarding this

For example, if I issue two kernels of one block each across two different streams, can they run on two multiprocessors simultaneously, or is it that the second one will stall until the first one is done, even though each kernel requires just one multiprocessor?

Regards
Gautham

Kernels will get serialized. Sadly, only a single kernel can execute at a time.

Ok. Section 4.5.2.4 made it seem like it might be possible to issue multiple kernels. Is it just that a kernel from one stream can be overlapped with a memory transfers from other streams?

I’ll try doing some tests. Do you know if any of the newer chips might support this?

Regards

Gautham

Kernels from one stream may be overlapped with memcpys from other streams (or other contexts, for that matter).

And no, no chips support multiple simultaneous kernels at the moment.