syncronize a warp

santyhyammer · August 24, 2008, 5:41pm

Is there any way to syncronize a warp’s threads instead of all the block’s threads?
Something like a syncwarp() to force all the SIMD elements to reach the same point.

_Big_Mac · August 24, 2008, 7:30pm

AFAIK a warp is always implicitly synchronized.

santyhyammer · August 24, 2008, 10:17pm

Yep but figure they follow different IF branches… I need a way to sync them. I could use a __syncthreads(), but that will sync the complete thread block… and I want to sync only the warp’s results.

tmurray · August 25, 2008, 2:29am

er, what? if some threads in a warp execute one basic block and some threads execute another, the entire warp will execute both basic blocks–it’s just that the outputs for some threads will be disabled for each basic block.

santyhyammer · August 25, 2008, 4:19am

Let me clarify. I need that to perform coherent packing. I have 32 results and I need to take a decision based on the result of the 32 threads. That’s why I need to sync a warp… and I need to perform it manually.

A practical example… Imagine I use 192 threads per block. I have, then, 6 groups of 32 threads.

unsigned int warpID = threadIdx.x/WARP_SIZE;//warpSize is 32 currently

bool hit = testHit(...);

if ( 0==warpID )

{

 Â  __syncthreads();

   //reduce the hit result of threads [0-31] and take a decision

}

else

{

   if ( 1==warpID )

   {

      __syncthreads();

      //reduce the hit result of threads [32-63] and take a decision

   }

   else

   {

      if ( 2==warpID )

      {

         __syncthreads();

         //reduce the hit result of threads [64-95] and take a decision

      }

      etc etc

   }

}

I was wondering if there is a more effective way to perform this skipping those if ( xxx=warpID )…

Of course, I could set 32 threads per block so __syncthreads() will be the syncwarp() I want… but 32 threads per block is not very optimal.

thx

tmurray · August 25, 2008, 4:25am

Are the additional warp voting functions in Compute 1.2 not useful? Might not provide enough granularity, but it seems like shared memory atomics might be an option (unless Compute 1.2 isn’t an option in the first place).

santyhyammer · August 25, 2008, 4:29am

Oh sorry, I forgot it… I’m restricted to capability 1.0. The code must run in an old GF8800GTX/Tesla.

Reimar · August 25, 2008, 6:36am

Sorry, but I completely do not understand your problem, are you sure you do? With only 32 threads __syncthreads() is a nop, it has no effect whatsoever (except for taking 4 cycles to execute).

I also can’t see the point of your if-then-else constructions, firstly a “switch (warpID)” probably would be much less ugly, but also I can’t imagine what your “reduce the hit result” function does that you can not just use e.g. 32*warpID as offset when accessing some shared array and not have any branches at all.

Sarnath · August 25, 2008, 8:24am

unsigned int warpID = threadIdx.x/WARP_SIZE;//warpSize is 32 currently

bool hit = testHit(...);

if ( 0==warpID )

{

 Â Â __syncthreads();

 Â  //reduce the hit result of threads [0-31] and take a decision

}

else

{

 Â  if ( 1==warpID )

 Â  {

 Â  Â  Â __syncthreads();

 Â  Â  Â //reduce the hit result of threads [32-63] and take a decision

 Â  }

 Â  else

 Â  {

 Â  Â  Â if ( 2==warpID )

 Â  Â  Â {

 Â  Â  Â  Â  __syncthreads();

 Â  Â  Â  Â  //reduce the hit result of threads [64-95] and take a decision

 Â  Â  Â }

 Â  Â  Â etc etc

 Â  }

}

I was wondering if there is a more effective way to perform this skipping those if ( xxx=warpID )…

Of course, I could set 32 threads per block so __syncthreads() will be the syncwarp() I want… but 32 threads per block is not very optimal.

thx

[/quote]

DIs code will hang straightaway… Threads in Warp 0 will be waiting on a syncthread() statement while warp1 threads will be waiting on another syncthreads statement and so on – resulting in a kernel hang

Topic		Replies	Views
Warp Synchronisation Problem? CUDA Programming and Performance	4	3945	November 30, 2008
Execute different instruction for each warp and synchronize CUDA Programming and Performance	6	1500	November 22, 2011
Why does single warp need syncthreads? CUDA Programming and Performance	2	1995	January 24, 2012
Is syncthreads required within a warp? CUDA Programming and Performance	10	12772	November 8, 2013
Synchronizing only subset of CUDA warps in block CUDA Programming and Performance	12	1291	June 18, 2025
is syncthreads needed when will divergent threads in same warp re-sync CUDA Programming and Performance	9	3397	January 23, 2012
are threads of a warp really sync? CUDA Programming and Performance	2	860	August 3, 2011
Confusion about __syncwarp() if all threads in a warp are automatically in sync? CUDA Programming and Performance	3	2053	March 10, 2020
synchronization and block independence CUDA Programming and Performance	3	1627	December 19, 2009
__syncthreads and thread-scheduling Scheduling intelligence CUDA Programming and Performance	1	4594	December 19, 2007

syncronize a warp

Related topics