Need synchronization between blocks?

enemyben88 · September 16, 2009, 4:07pm

Hey all,

I was just wondering if anyone has encountered some kind of programming application or algorithm where they really wanted/needed to have some synchronization between blocks/multiprocessors? In my current application, I basically have a single block per multiprocessor on my device. However, sometimes I need the outputs of one of these blocks to provide the inputs to another block. In my application, I am basically running until all the blocks converge to a stable state before exiting the kernel, but this means there is an undetermined number of times my blocks will evaluate their inputs.

Since the dependency between blocks, initially I had just used multiple kernel calls (First the lowest set of inputs execute on the device and the kernel exits. The next highest tier then uses the outputs as inputs, etc). However, since I am running to convergence, this may mean a lot of overhead of kernel-calls. So instead, I have basically created a work-queue like structure. When Block A finishes its output, it schedules B, who uses Block-A’s outputs as its inputs. This works using atomic primitives, so long as you dont have more blocks than multiprocessors.

I guess I am interested if anyone else can think of some simple applications/algorithms that need to run to some undetermined number of times to convergence? Basically I would like to see if this kind of scheduling-queue using atomic primitives can be equally benefitial to some application other than my own. Any ideas would be greatly appreciate! Thanks!

mickey_mouse · September 16, 2009, 4:51pm

In my opinion you are looking at the problem from wrong side. In the chapter 4 of programming guide you can read that the architecture of CUDA-capable devices is basically SIMD (single instruction multiple data). This means that for optimum performance all the threads launched must execute about the same code, but with different data.

So try thinking to split your problem in another way: try to “split” in some way your input data, with the objective of processing only one part of your data with one thread. You can place a loop in the kernel code that re-iterate the processing until you can reach the convergence you want. In this manner you can launch one thread per each block of input data and that thread operate on that data until it reaches the convergence.

I hope you have understand me, I’m not english and also I’m a CUDA newbie.

JeremiahPalmer · September 16, 2009, 5:00pm

In my opinion you are looking at the problem from wrong side. In the chapter 4 of programming guide you can read that the architecture of CUDA-capable devices is basically SIMD (single instruction multiple data). This means that for optimum performance all the threads launched must execute about the same code, but with different data.

So try thinking to split your problem in another way: try to “split” in some way your input data, with the objective of processing only one part of your data with one thread. You can place a loop in the kernel code that re-iterate the processing until you can reach the convergence you want. In this manner you can launch one thread per each block of input data and that thread operate on that data until it reaches the convergence.

I hope you have understand me, I’m not english and also I’m a CUDA newbie.

Correct, Mickey.

The only parallel work done simultaneously is within/between threads in a thread block. Thread block work is done in any order, parallel or series.

tmurray · September 16, 2009, 5:18pm

Oh, there are absolutely cases where synchronization between blocks is very useful.

Topic		Replies	Views
question about __syncthreads(); CUDA Programming and Performance	9	8617	March 17, 2008
synchronisation between blocks CUDA Programming and Performance	2	1476	June 11, 2009
synchronization between blocks CUDA Programming and Performance	2	747	December 5, 2014
cuda block synchronization CUDA Programming and Performance	4	8393	June 20, 2011
cuda block synchronization CUDA Programming and Performance	1	982	June 19, 2011
Synchronize all blocks in CUDA CUDA Programming and Performance	12	45153	October 25, 2013
sync over blocks age old question CUDA Programming and Performance	2	2877	September 9, 2008
Synchronizing Blocks CUDA Programming and Performance	3	2434	January 10, 2018
Synchronization across all threads CUDA Programming and Performance	9	6596	August 22, 2008
Inter-Block Dependency CUDA Programming and Performance	13	12035	January 9, 2011

Need synchronization between blocks?

Related topics