hello, NV experts
the performance will be very poor, if there is if-else in warp, like this:
val= global[lane_idx];
if(val >= 16){
......
......
}else{
......
......
}
now, the if-else is not occured in warps, it appeared between warps, like this:
int warp_idx = threadIdx.x / 32;
if(warp_idx > 1){
......
function for warp 0&1
......
}else{
......
function for warp 2&3
......
}
I am not sure the effection of above code.
I found there is not any poor effection in my application, I found only the code’s size become bigger.
I’m not sure how it behave on other CUDA-ARCH(my arch is 8.6, ampere)。
So, how to evaluate above code?