What's the difference between BRA.DIV and BRA instructions?

Does “div” stand for divergency? What is the difference between BRA.DIV and BRA instructions during execution? BRA.DIV has an additional parameter compared to the BRA instruction. What does this extra first parameter represent?

I found BRA.DIV in sass code: BRA.DIV ~URZ, 0x7f1287298740

Your question contains two different sass instructions. BAR and BRA . Did you mix them up?

Thank you for your reply. I made a typo. It should be BRA.DIV and BRA SASS instructions.

Based on the single example given, the additional parameter is likely a mask, all-ones in our example. Presumably a mask for threads, so covering the entire warp in this example. DIV likely refers to divergence. To heap speculation upon speculation, the semantics might be “if there is divergence (= difference of PC values) across the threads covered by the mask, then branch to the target address indicated, else fall through” (=> at that point control flow uniform across the threads covered by the mask).

This is still a guess and nothing more.

Thank you for your reply.
I conducted some experiments and am confident that your speculation is correct.

Edit: The following guess was wrong:

It is like a hint.
A PTX instructions for it is bra.uni vs bra: PTX ISA 8.4

The compiler can specify, whether the branch is definitely divergent, definitely non-divergent or it does not know.

For combining the results from each thread, one needs an additional vote, match or reduction instruction first.

I conducted the following experiment. The kernel source code is as follows:

void test_bra(uint32_t *out, uint32_t count){
    __shared__ uint32_t data[32];
    if (threadIdx.x < 32) {
        asm("barrier.cta.sync 0, 64;\n\t");
        data[threadIdx.x] = threadIdx.x;
        asm("barrier.cta.arrive 1, 64;\n\t");
    } else {
        asm("barrier.cta.arrive 0, 64;\n\t");
        asm("barrier.cta.sync 1, 64;\n\t");
        out[threadIdx.x - 32] = data[threadIdx.x - 32];

The compiled SASS code is as follows:

1	00007fef 5f25d200	      IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] 
2	00007fef 5f25d210	      S2R R0, SR_TID.X 
3	00007fef 5f25d220	      ISETP.GE.U32.AND P0, PT, R0, 0x20, PT 
4	00007fef 5f25d230	@!P0  BRA 0x7fef5f25d330 
5	00007fef 5f25d240	      BRA.CONV ~URZ, 0x7fef5f25d2a0 
6	00007fef 5f25d250	      MOV R2, 0x70 
7	00007fef 5f25d260	      CALL.REL.NOINC 0x7fef5f25d3f0 
8	00007fef 5f25d270	      MOV R2, 0x90 
9	00007fef 5f25d280	      CALL.REL.NOINC 0x7fef5f25d4b0 
10	00007fef 5f25d290	      BRA 0x7fef5f25d2c0 
11	00007fef 5f25d2a0	      BAR.ARV 0x0, 0x40 
12	00007fef 5f25d2b0	      BAR.SYNC 0x1, 0x40 
13	00007fef 5f25d2c0	      LDS R5, [R0.X4+-0x80] 
14	00007fef 5f25d2d0	      IADD3 R2, R0, -0x20, RZ 
15	00007fef 5f25d2e0	      IMAD.MOV.U32 R3, RZ, RZ, 0x4 
16	00007fef 5f25d2f0	      ULDC.64 UR4, c[0x0][0x118] 
17	00007fef 5f25d300	      IMAD.WIDE.U32 R2, R2, R3, c[0x0][0x160] 
18	00007fef 5f25d310	      STG.E [R2.64], R5 
19	00007fef 5f25d320	      EXIT 
20	00007fef 5f25d330	      BRA.CONV ~URZ, 0x7fef5f25d370 
21	00007fef 5f25d340	      MOV R2, 0x160 
22	00007fef 5f25d350	      CALL.REL.NOINC 0x7fef5f25d470 
23	00007fef 5f25d360	      BRA 0x7fef5f25d380 
24	00007fef 5f25d370	      BAR.SYNC 0x0, 0x40 
25	00007fef 5f25d380	      STS [R0.X4], R0 
26	00007fef 5f25d390	      BRA.CONV ~URZ, 0x7fef5f25d3d0 
27	00007fef 5f25d3a0	      MOV R2, 0x1c0 
28	00007fef 5f25d3b0	      CALL.REL.NOINC 0x7fef5f25d430 
29	00007fef 5f25d3c0	      EXIT 
30	00007fef 5f25d3d0	      BAR.ARV 0x1, 0x40 
31	00007fef 5f25d3e0	      EXIT 
32	00007fef 5f25d3f0	      IMAD.MOV.U32 R3, RZ, RZ, 0x0 
33	00007fef 5f25d400	      WARPSYNC 0xffffffff 
34	00007fef 5f25d410	      BAR.ARV 0x0, 0x40 
35	00007fef 5f25d420	      RET.REL.NODEC R2 0x7fef5f25d200 
36	00007fef 5f25d430	      MOV R3, 0x0 
37	00007fef 5f25d440	      WARPSYNC 0xffffffff 
38	00007fef 5f25d450	      BAR.ARV 0x1, 0x40 
39	00007fef 5f25d460	      RET.REL.NODEC R2 0x7fef5f25d200 
40	00007fef 5f25d470	      IMAD.MOV.U32 R3, RZ, RZ, 0x0 
41	00007fef 5f25d480	      WARPSYNC 0xffffffff 
42	00007fef 5f25d490	      BAR.SYNC 0x0, 0x40 
43	00007fef 5f25d4a0	      RET.REL.NODEC R2 0x7fef5f25d200 
44	00007fef 5f25d4b0	      MOV R3, 0x0 
45	00007fef 5f25d4c0	      WARPSYNC 0xffffffff 
46	00007fef 5f25d4d0	      BAR.SYNC 0x1, 0x40 
47	00007fef 5f25d4e0	      RET.REL.NODEC R2 0x7fef5f25d200 
48	00007fef 5f25d4f0	      BRA 0x7fef5f25d4f0

Lines 5 to 18 of the SASS code correspond to the code within the else block in the source code. A careful analysis of this portion of SASS code reveals that the execution logic of BRA.DIV should be consistent with what njuffa mentioned.

However, I’m curious why the compiler generates lines 5 to 10 of the SASS code. The SASS code from lines 5 to 10 is generated for the case of intra-warp divergence, but according to the source code, the if-else statement does not result in intra-warp divergence. Why does the compiler generate SASS instructions in this way?

Cuda architecture: sm_86
Compiler version:Cuda compilation tools, release 12.3, V12.3.107