Jetson Nano is compute capability 5.3. This section PTX ISA 8.4 states:
For .target sm_6x or below,
barrier{.cta} instruction without .aligned modifier is equivalent to .aligned variant and has the same restrictions as of .aligned variant.
Does that mean that if I use
barrier.sync 0,64;
that behaves like __syncthreads() causing all threads in a block to synchronize? If so, is there any way to synchronize only a subset of warps in a block with Jetson Nano?
Hi,
no I’m not asking for intra-warp sync, but inter-warp. My current need is to synchronize two warps in a block without synchronizing all of them, as __syncthreads() does.
I’m currently experimenting with:
__shared__ volatile uint32_t lock;
lock = 0;
if (lane == 0) // index of the thread within the warp
{
atomicXor((uint32_t *)&lock, 1U);
while (lock != 0)
;
}
that seems to work if just two warps are allowed to reach that code, considering also that in compute capabilities 5.3 all threads in a warp are implicitly synchronized so if one thread stays on hold in the while loop the whole warp is on hold.
But I wonder if a more elegant solution exists, without using shared memory and atomics.
Thanks