How to schedule different thread among the grids?


I am using CUDA to do 3D volume data processing.
The behavior is different and independent along X, Y, Z direction.
So I want to have three different threads running through the data.
Dose CUDA support this?

Or, Let me put in this way: I write three different thread; EdgeDetectX, EdgeDetectY, EdgeDetectZ

if I call the kernel function in such a sequence,
EdgeDetectX<<< grid, threads >>>(d_A, d_B, WIDTH);
EdgeDetectY<<< grid, threads >>>(d_A, d_B, WIDTH);
EdgeDetectZ<<< grid, threads >>>(d_A, d_B, WIDTH);

dose it mean this runs sequentially?

Is there any way I can schedule these three keneral to run at the same time?

I am new to CUDA. really appreciate you guys’ help. :D

No, kernels can never run at the same time on one GPU.

I’m confused. I thought that that was the purpose of the optional ‘stream’ parameter that you can pass to kernels. Check out the Stream Management section of the programming guide.

It will overlap PCIe transfers and kernel executions, but it won’t overlap execution at the present time. Note that there’s nothing that says we can’t do that in the future, so don’t assume that it will always be this way.