Hi,
I am new to GPU programming and currently try to optimize a given program.
I tried to programm the C instruction
y = __shfl_sync(0xFFFFFFFF, Var[j].x, ipp, NUMBER_THREADS);
in PTX assembly language, but I did not succeed.
What I tried is:
shfl.sync.idx.b32 %0, %3, %2, %1, 0xFFFFFFFF;
shfl.sync.btfly.b32 %0, %3, %2, %1, 0xFFFFFFFF;
shfl.sync.up.b32 %0, %3, %2, %1, 0xFFFFFFFF;
shfl.sync.down.b32 %0, %3, %2, %1, 0xFFFFFFFF;
where the parameter constellation is
: “=r”(y)
: “r”(Var[j].x), “r”(ipp), “r”(NUMBER_THREADS));
None of these versions works. What is my mistake?
Thanks in advance
Norbert