How to apply "uniform datapath instruction"

I found a feature in nv’s documents: “uniform datapath instruction” can be proccessed by special unit, they can balance some work load of FMA。

I just know only one case which can apply this feature:

int           warp_idx      = __shfl_sync(0xFFFFFFFF,threadIdx.x / 32, 0,  32);

it broad cast warp_id from thread0 to all threads of warp.

Is there any other case? is there anyone would like to tell me?

at the same time, I think this feature is applied in an implicit style by nvcc.
So, Is there some method: I can call this instruction in explicit style?
for example:

if(0 == warp_idx){

I found “bra.uni” in ptx, is “bra.uni” is “uniform datapath instruction”? I’m not sure.
how can I specify “bra.uni” in my cuda-c code in explicit style, if “bra.uni” is “uniform datapath instruction”?

1 Like

The uniform datapath and associated instructions are only directly evident at the machine or SASS level. There is no direct control to issue these instructions at a higher level (e.g. not selectable via PTX or CUDA-C++). And CUDA doesn’t formally provide methods to write SASS code directly. You’re dependent on one of the compilers decisions about instruction selection.

It is not a uniform datapath instruction. Instead the “uni” there refers to the PTX usage/terminology which indicates all threads in a warp performing an operation uniformly.

In processors in general, branches along with call and return instructions are control-path (also: control-flow) instructions. Data-path instructions are instructions that perform arithmetic or logic operations.

1 Like

thank you

On Turing, the compiler has the option to push these integer operations
onto the separate uniform datapath, out of the way of the main datapath. To
do so, the compiler must emit uniform datapath instructions

this paper says that “the compiler has the option to push these integer operations onto the separate uniform datapath”, I didn’t find these option from nvcc’s help.
Would you like to teach me?

I am pretty sure this means exactly what it says: the compiler has the option, not you as the user.

In other words, the compiler (in particular pxtas) figures out during one or several of its optimization phases which instructions should be pushed into which datapath. If so, that process is driven by undisclosed (and likely frequently changing) heuristics, just like other optimizations.