CUDA streams, default stream zero

Hi there,

I i have a quick question regarding cuda streams. I have a big project which contains about 30+ different kernel calls, which I have created between 2009 and now.
I have to implement an additional kernel now which could be perfectly executed while the others are doing their thing. So I thought that sounds like a perfect use case for a stream.
Question now, as far as I got it not mentioning a stream number results in a “0” stream, is it then possible that I only create one more kernel with foo_kernel<<< , , , 1>>>() or do i have to add stream numbers to all my legacy kernels as well. I know that I will still have to do additional memcopies and sync barriers.

thanks in advance


I think for concurrent execution you need to have all kernels in non-default streams.
So letting the legacy kernels reside in default stream #0 won’t work for you.


There are limitations on CUDA commands issued to the default stream, see

So, a good practice is trying to avoid default streams.