Inter-warp synchronization with Jetson Nano

ChessVisio · May 1, 2024, 9:14am

Jetson Nano is compute capability 5.3. This section PTX ISA 8.4 states:

For .target sm_6x or below,

barrier{.cta} instruction without .aligned modifier is equivalent to .aligned variant and has the same restrictions as of .aligned variant.

Does that mean that if I use

barrier.sync 0,64;

that behaves like __syncthreads() causing all threads in a block to synchronize? If so, is there any way to synchronize only a subset of warps in a block with Jetson Nano?

AastaLLL · May 2, 2024, 7:13am

Hi,

Are you asking for the Independent Thread Scheduling feature?
If yes, it only supports the device with architecture >=7.x:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#independent-thread-scheduling-7-x

Thanks.

ChessVisio · May 2, 2024, 4:33pm

Hi,
no I’m not asking for intra-warp sync, but inter-warp. My current need is to synchronize two warps in a block without synchronizing all of them, as __syncthreads() does.
I’m currently experimenting with:

__shared__ volatile uint32_t lock;

lock = 0;
if (lane == 0)  // index of the thread within the warp
{
    atomicXor((uint32_t *)&lock, 1U);
    while (lock != 0)
        ;
}

that seems to work if just two warps are allowed to reach that code, considering also that in compute capabilities 5.3 all threads in a warp are implicitly synchronized so if one thread stays on hold in the while loop the whole warp is on hold.

But I wonder if a more elegant solution exists, without using shared memory and atomics.
Thanks

AastaLLL · May 13, 2024, 6:30am

Hi,

We don’t have such an API.
But you can do it in a logical way as you mentioned.

Thanks

system · June 4, 2024, 5:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synchronizing only subset of CUDA warps in block CUDA Programming and Performance	12	1220	June 18, 2025
syncronize a warp CUDA Programming and Performance	8	2908	August 25, 2008
Warp Synchronisation Problem? CUDA Programming and Performance	4	3919	November 30, 2008
synchronization and block independence CUDA Programming and Performance	3	1616	December 19, 2009
Is syncthreads required within a warp? CUDA Programming and Performance	10	12697	November 8, 2013
Synchronizing warps between SMs Jetson AGX Orin kernel	4	553	April 26, 2023
Synchronization among blocks CUDA Programming and Performance	9	3772	May 4, 2010
Particular thread-thread synchronization CUDA Programming and Performance	3	669	December 25, 2017
about the __syncwarp() in P100 CUDA Programming and Performance	11	4216	June 6, 2018
Confusion about __syncwarp() if all threads in a warp are automatically in sync? CUDA Programming and Performance	3	1982	March 10, 2020

Inter-warp synchronization with Jetson Nano

Related topics