this is a newbie question but I found no definite answer in the Cuda prog. guide. Is it possible to execute multiple different kernels in parallel using different streams for each kernel? The guide says that the memcpy and kernel execution of different streams can run concurrently which kind of alludes that only one kernel can be active.
Is this an architectural limitation and is it possible that with next GPU generations this will change?