shuffling warp


Using all the tech web page of Nvidia I was able to make a nice “segmented non regular reduction”, however to finish my code I need to make a shuffling instructions a bit complex. To simplify if I have a warp with 8 lanes (to simplify but 32 line in real ) I would like do the following transformation into a warp:

 A B C D E F G H 

 A C E G B D F H

I think shfl.idx should do the job but I am not able to determine the constante for the mnemonic

__forceinline__ __device __ float shuffle(float var){
   float ret;
   int srcLane = ???
   int c = ???
   asm volatile ("shfl.idx.b32 %0, %1, %2, %3;" : "=f"(ret) : "f"(var), "r"(srcLane), "r"(c));
   return ret;

I post on StackOverflow there is a nicest figure. But I do not know if such transformation is possible. To conclude,
I am currently glue at cuda 8.



[ps]: pb fix

Please see my answer at StackOverflow:

Thank you ! All the informations on the net were on reduction no “pure” shuffling,

AlexSh your answer there is incorrect.