Questions about STREAM

Hi. I have some questions about STREAM.

  • Suppose Compute capability is 1.3.
  • Suppose kernel1, kernel2, and cudaMemcpyAsync spend the same time.

(Q1)
In ï½”he following code, are kernel1 and cudaMemcpyAsync can run concurrently ?
:
kernel1<<<1,1,0,stream1>>>(dA);
kernel2<<<1,1,0,stream3>>>(dB); <==== not default stream
cudaMemcpyAsync(dC ,C,size,cudaMemcpyHostToDevice ,stream2);
:

(Q2)
In the following code, are kernel1 and cudaMemcpyAsync can run concurrently ?
I think they can’t run because kernel2 (which has stream zero) is issued before cudaMemcpyAsync.

:
kernel1<<<1,1,0,stream1>>>(dA);
kernel2<<<1,1,0,0>>>(dB); <==== default stream(zero)
cudaMemcpyAsync(dC ,C,size,cudaMemcpyHostToDevice ,stream2);
:

(Q3)
In the following code, are kernel1 and cudaMemcpyAsync No.(2) can run concurrently ?
I think they can’t run because No.(2) is not the first cudaMemcpyAsync.

:
kernel1<<<1,1,0,stream1>>>(dA); <==== it has stream1
cudaMemcpyAsync(dB ,B,size,cudaMemcpyHostToDevice ,stream1); (1) <==== it has stream 1, so it can’t run
cudaMemcpyAsync(dC ,C,size,cudaMemcpyHostToDevice ,stream2); (2)
: