Hi,
I am currently trying to figure out whether it is possible for the GPU to execute multiple kernels simultaneously.
I think the documentation is not very clear about this point, and I am really confused by the following experiment :
Have a look at this screenshot of cudaprof :
I have two kernels that are launched : (memcpy removed for clarity)
- (almost) at the same time in distinct streams
// PUT DATA 1 and 2 ...
kernelSpl2FusDevice <<< nBlocks, blockSize, 0, stream2 >>> ( b1_dev, b2_dev, b3_dev );
kernelSpl1FusDevice <<< nBlocks, blockSize, 0, stream1 >>> ( a1_dev, a2_dev, a3_dev );
// GET DATA 1 and 2 ...
cudaStreamSynchronize ( stream1 );
cudaStreamSynchronize ( stream2 );
- separately
// PUT DATA 1 ...
kernelSpl1SepDevice <<< nBlocks, blockSize, 0, stream1 >>> ( a1_dev, a2_dev, a3_dev );
// GET DATA 1 ...
cudaStreamSynchronize ( stream1 );
// PUT DATA 2 ...
kernelSpl2SepDevice <<< nBlocks, blockSize, 0, stream2 >>> ( b1_dev, b2_dev, b3_dev );
cudaStreamSynchronize ( stream1 );
// GET DATA 2 ...
Now here is the profiling output :
timestamp method gputime cputime occupancy
92809 _Z19kernelSpl2FusDevicePfS_S_ 91.584 14 0.5
92832 _Z19kernelSpl1FusDevicePfS_S_ 36.128 3 0.667
94266 _Z19kernelSpl1SepDevicePfS_S_ 35.712 4 0.667
95248 _Z19kernelSpl2SepDevicePfS_S_ 91.904 4 0.5
According to those numbers (and the plot above), you may think that kernelSpl1Fus() is running simultaneously with kernelSpl2Fus(), check timestamps and gpu timings.
But is it really the case ? Are the timestamp, gputime, cputime values reliable ?
With my own timing (using either gettimeofday() or cutStartTimer()), I got the following numbers :
fus (spl1 and spl2) : 220
sep (spl1) : 59
sep (spl2) : 115
It turns out that those timings contradict the profiling data.
We can see how spl1 and spl2 are conflicting when executed together (220 > 59+115).
So I am kind of lost here. I would like to know what really happens on the GPU.
Is it possible that the CUDA scheduler allocates some multiprocessors to kernel 1, and some others to kernel 2 ?
If no, how come the profiling tool returns that plot ? How are the timestamps, gputime values collected ?
Best Regards,
Raz.