Global thread barrier

Lev · April 15, 2010, 4:25pm

peastman, andradx

I think I found good and fast solution of your iteration problem.

global blocknum=0;
global completedblockquantity=0;

at the start of the block

shared current_block_num;

shared int currentiteration, blockindex;

if (threadIdx==0)
{
current_block_num=atomic_inc(blocknum);

currentiteration=current_block_num/blocks_per_iteration;
blockindex=current_block_num%blocks_per_iteration;

some code for waiting prior blocks to complete.
like this

wait while (completedblockquantity>=currentiteration*blocks_per_iteration)
}

__syncthreads();

data=datarray[blockindex*blocksize+threadIdx.x]

and so on

at the end of the block

__threadfence();

atomic_inc(completedblockquanity);

at kernell launch you need to mass blocks according to (data size)*(iteration quantity)

Topic		Replies	Views
__syncblocks 101 Primitives for Interblock syncronization CUDA Programming and Performance	16	10033	February 29, 2008
Global sync barrier problem Xiao, Feng global barrier isn't working as expected CUDA Programming and Performance	5	859	March 8, 2012
CUDA Kernel self-suspension ? Can a CUDA Kernel conditionally suspend its execution ? CUDA Programming and Performance	46	45268	April 17, 2011
Synchronization methods? CUDA Programming and Performance	11	2151	November 7, 2010
Synchronize all blocks in CUDA CUDA Programming and Performance	12	45957	October 25, 2013
interblock sync without __threadfence() ? CUDA Programming and Performance	17	8477	May 7, 2009
A global barrier for blocks the barrier is failing... CUDA Programming and Performance	4	1621	February 6, 2010
GPU synchronization __threadfence() CUDA Programming and Performance	17	3411	August 7, 2010
Best way to pack bits into words for global memory Better than reduce in shared memory? CUDA Programming and Performance	17	6711	June 2, 2012
synchronization and block independence CUDA Programming and Performance	3	1551	December 19, 2009