Hello,
Using all the tech web page of Nvidia I was able to make a nice “segmented non regular reduction”, however to finish my code I need to make a shuffling instructions a bit complex. To simplify if I have a warp with 8 lanes (to simplify but 32 line in real ) I would like do the following transformation into a warp:
A B C D E F G H
A C E G B D F H
I think shfl.idx should do the job but I am not able to determine the constante for the mnemonic
__forceinline__ __device __ float shuffle(float var){
float ret;
int srcLane = ???
int c = ???
asm volatile ("shfl.idx.b32 %0, %1, %2, %3;" : "=f"(ret) : "f"(var), "r"(srcLane), "r"(c));
return ret;
}
I post on StackOverflow there is a nicest figure. But I do not know if such transformation is possible. To conclude,
I am currently glue at cuda 8.
https://stackoverflow.com/questions/49197970/warp-shuffling-for-cuda-8-0
Best,
Timocafe
[ps]: pb fix