I conducted the following experiment. The kernel source code is as follows:
__global__
void test_bra(uint32_t *out, uint32_t count){
__shared__ uint32_t data[32];
if (threadIdx.x < 32) {
asm("barrier.cta.sync 0, 64;\n\t");
data[threadIdx.x] = threadIdx.x;
asm("barrier.cta.arrive 1, 64;\n\t");
} else {
asm("barrier.cta.arrive 0, 64;\n\t");
asm("barrier.cta.sync 1, 64;\n\t");
out[threadIdx.x - 32] = data[threadIdx.x - 32];
}
}
The compiled SASS code is as follows:
1 00007fef 5f25d200 IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
2 00007fef 5f25d210 S2R R0, SR_TID.X
3 00007fef 5f25d220 ISETP.GE.U32.AND P0, PT, R0, 0x20, PT
4 00007fef 5f25d230 @!P0 BRA 0x7fef5f25d330
5 00007fef 5f25d240 BRA.CONV ~URZ, 0x7fef5f25d2a0
6 00007fef 5f25d250 MOV R2, 0x70
7 00007fef 5f25d260 CALL.REL.NOINC 0x7fef5f25d3f0
8 00007fef 5f25d270 MOV R2, 0x90
9 00007fef 5f25d280 CALL.REL.NOINC 0x7fef5f25d4b0
10 00007fef 5f25d290 BRA 0x7fef5f25d2c0
11 00007fef 5f25d2a0 BAR.ARV 0x0, 0x40
12 00007fef 5f25d2b0 BAR.SYNC 0x1, 0x40
13 00007fef 5f25d2c0 LDS R5, [R0.X4+-0x80]
14 00007fef 5f25d2d0 IADD3 R2, R0, -0x20, RZ
15 00007fef 5f25d2e0 IMAD.MOV.U32 R3, RZ, RZ, 0x4
16 00007fef 5f25d2f0 ULDC.64 UR4, c[0x0][0x118]
17 00007fef 5f25d300 IMAD.WIDE.U32 R2, R2, R3, c[0x0][0x160]
18 00007fef 5f25d310 STG.E [R2.64], R5
19 00007fef 5f25d320 EXIT
20 00007fef 5f25d330 BRA.CONV ~URZ, 0x7fef5f25d370
21 00007fef 5f25d340 MOV R2, 0x160
22 00007fef 5f25d350 CALL.REL.NOINC 0x7fef5f25d470
23 00007fef 5f25d360 BRA 0x7fef5f25d380
24 00007fef 5f25d370 BAR.SYNC 0x0, 0x40
25 00007fef 5f25d380 STS [R0.X4], R0
26 00007fef 5f25d390 BRA.CONV ~URZ, 0x7fef5f25d3d0
27 00007fef 5f25d3a0 MOV R2, 0x1c0
28 00007fef 5f25d3b0 CALL.REL.NOINC 0x7fef5f25d430
29 00007fef 5f25d3c0 EXIT
30 00007fef 5f25d3d0 BAR.ARV 0x1, 0x40
31 00007fef 5f25d3e0 EXIT
32 00007fef 5f25d3f0 IMAD.MOV.U32 R3, RZ, RZ, 0x0
33 00007fef 5f25d400 WARPSYNC 0xffffffff
34 00007fef 5f25d410 BAR.ARV 0x0, 0x40
35 00007fef 5f25d420 RET.REL.NODEC R2 0x7fef5f25d200
36 00007fef 5f25d430 MOV R3, 0x0
37 00007fef 5f25d440 WARPSYNC 0xffffffff
38 00007fef 5f25d450 BAR.ARV 0x1, 0x40
39 00007fef 5f25d460 RET.REL.NODEC R2 0x7fef5f25d200
40 00007fef 5f25d470 IMAD.MOV.U32 R3, RZ, RZ, 0x0
41 00007fef 5f25d480 WARPSYNC 0xffffffff
42 00007fef 5f25d490 BAR.SYNC 0x0, 0x40
43 00007fef 5f25d4a0 RET.REL.NODEC R2 0x7fef5f25d200
44 00007fef 5f25d4b0 MOV R3, 0x0
45 00007fef 5f25d4c0 WARPSYNC 0xffffffff
46 00007fef 5f25d4d0 BAR.SYNC 0x1, 0x40
47 00007fef 5f25d4e0 RET.REL.NODEC R2 0x7fef5f25d200
48 00007fef 5f25d4f0 BRA 0x7fef5f25d4f0
Lines 5 to 18 of the SASS code correspond to the code within the else block in the source code. A careful analysis of this portion of SASS code reveals that the execution logic of BRA.DIV should be consistent with what njuffa mentioned.