Hi,
I have a question about the use of switch-case statement inside a Cuda kernel.
I have the following kernel:
template <typename T>
__global__ void myKernel(T *output, bool b_ex,
T a1, T a2, T a3,
T s1, T s2, T s3,
T param1, T param2,
char type,
int n23, int n3, int n)
{
int x = blockDim.x * blockIdx.x + threadIdx.x;
int y = blockDim.y * blockIdx.y + threadIdx.y;
int z = blockDim.z * blockIdx.z + threadIdx.z;
int idx = x * n23 + y * n23 + z;
if ( idx >= n) return;
T d1 = a2*s3 + s1;
for (int i = 0; i < N_DEEP; ++i, ++idx)
{
T d3 = s1*a3 + s2*a1;
T sqr = d1 + d3*d3;
switch(type)
{
case F_TYPE:
deviceFunction_1(output, idx, b_ex, param1, sqr);
break;
case Z_TYPE:
deviceFunction_2(output, idx, param1, param2, sqr);
break;
case K_TYPE:
deviceFunction_3(output, idx, param1, sqr);
break;
default;
break;
}
}
}
My question is about of the performance of my kernel. In sequential code I have a function that only change in some operations. Then the first implementation of this function with CUDA kernel is like show above. I have split the three specific calculations in three device functions. Then, inside the kernel I check the case and execute the correct operations. This implementation works fine.
My doubt is if the use of switch-case statement (the same for if statement) will affect in the performance of the execution of the kernel.
Performance will be greatly affected?
It would be better to use three different implementations of the kernel?
Thank you!