__shfl_sync iinstruction

Hi,

I am new to GPU programming and currently try to optimize a given program.

I tried to programm the C instruction

y = __shfl_sync(0xFFFFFFFF, Var[j].x, ipp, NUMBER_THREADS);

in PTX assembly language, but I did not succeed.

What I tried is:

shfl.sync.idx.b32 %0, %3, %2, %1, 0xFFFFFFFF;
shfl.sync.btfly.b32 %0, %3, %2, %1, 0xFFFFFFFF;
shfl.sync.up.b32 %0, %3, %2, %1, 0xFFFFFFFF;
shfl.sync.down.b32 %0, %3, %2, %1, 0xFFFFFFFF;

where the parameter constellation is

: “=r”(y)
: “r”(Var[j].x), “r”(ipp), “r”(NUMBER_THREADS));

None of these versions works. What is my mistake?

Thanks in advance

Norbert