Hello:
Is it recommendet to throw multiple kernels at once? So, I mean, let’s supose we have some functions like:
__global__ void cusum(int* a, int* b){
//performs some validations, then does a += b
deviceadd(a[blockIdx.x * blockDim.x + threadIdx.x],
b[blockIdx.x * blockDim.x + threadIdx.x]);
}
__global__ void cusub(int* a, int* b){
//performs some validations, then does a -= b
devicesub(...);
}
...
I know these are really simple functions, are just to use as an example.
So, let’s suposse we want to calculate (a + b) * (c + d); we can just run two indepentent kernel to add and then just multiply after sync:
int main(){
//a += b
cusum<<<x, y>>>(a, b);
//c -= d
cusub<<<x, y>>>(c, d);
//sync both kernels
cudaDeviceSynchronize();
//a *= c
cumul<<<x, y>>>(a, c);
//sync cumul kernel
cudaDeviceSyncryonize();
}
But, it can be also done on one big function as well
int main(){
cuCalculate<<<x, y, >>>(a, b, c, d);
//sync cuCalculate
cudaDeviceSyncryonize();
}
__global__ void cuCalculate(int* a, int* b, int* c, int* d){
deviceadd(...);
deviceadd(...);
devicemul(...);
}
So, with first method I understand that it can works faster as both additions are made on parallel, but I’m not sure it won’t will cause some slowdown as there will be (maybe) too many operations working simultaneously.
I would like to understand the main pros and cons about both approaches (I may assume that first one will need a more powerful GPU to work fine?)
Thanks.