parallel computations with CUDA

Hi all,
I have a question and I will appreciate if you could let me know your opinions: Is that possible to run 2 or more Grids concurrently in parallel? e.g. I have 10 matrices which I want to multiply them together, is that possible that I define 5 separate Grids and simultaneously multiply these matrices 2 by 2?

A card can only run one grid at a time. But why can’t you do it in one grid?

Because I want to do parallel computations: suppose multiplication of each 2 matrices takes 1 p.u. of time. I will need 9 p.u. of time to do multipilcation for 10 matrices. However, if I could run the kerenls in parallel I will need 3 p.u. of time to multiply 10 matrices.

There is not any way that I could change the index of threads or blocks at the beginning of a kernel?

You can calculate your matrix-specific index if you pass a parameter with size-info of your matrices to the kernel.

Like index = blockDim.x * blockIdx.x + threadIdx.x;

index = rem(index, size_of_matrix);

In this case can I run 2 kernels in parallel and simultaneously?

No, but you can have a single kernel that calculates many multiplications in parallel. You can compute those 10 matrices in a single kernel.

Depending on exactly how you’re multiplying them together, you could combine them into a larger matrix and run calculations on that instead (possibly by using cuBLAS). You could then separate out the original matrices from the larger “container” matrix after your calculations are completed.

Hi, I am back again

Would you please make it more clear, do you have any special method in your mind?

For example I have matrices A, B, C and D which are inputs of a kernel. Is that possible to do AB and CD concurrently (parallel in time) inside of this kernel?