Interthreaded communication using cuda

SURAJITSAIKIA · September 1, 2015, 8:51am

hi guys i have a big sequential for loop and the dependency part is there only in the bottom part of the code.
for example:
for( )
{…

independent computation here
…
a[i]=a[i-1] op a[i]//dependency
}

before the dependency part all steps are independent.
Now my question is if: i want to implement this in GPU, then i would like all the independent steps to be executed parallely. Now the moment the first iteration is over i want to pass a[i-1] value to successive threads in successive iterations. I dont know if it is a good idea, and if it is then what is the best way to do this?

little_jimmy · September 1, 2015, 10:11am

how many iterations does the for loop run?

what is the ratio/ weight of the independent portion to the dependent portion?

how deep is a?

SURAJITSAIKIA · September 1, 2015, 2:33pm

the loop depends on the size of an image. Now i am working with an image and number of iterations is more then 200.

SURAJITSAIKIA · September 1, 2015, 2:35pm

ratio will be around 2:7

little_jimmy · September 2, 2015, 5:02am

you could likely still parallelize the problem

given the weak ratio, one might simply forget about the independent section

you could still assign a thread to an array element a[i]
the thread would then simply calculate a[i] and a[i - 1], given the relatively cheap cost thereof, compared to the rest of the computation

Topic		Replies	Views
Possible to paralellize dependent forloop in cuda? Each iteration has to occur in the order CUDA Programming and Performance	1	4045	June 16, 2008
Parallelisim CUDA Programming and Performance	3	2653	August 24, 2007
One more questions Legacy PGI Compilers	3	1941	March 16, 2012
Data dependency CUDA Programming and Performance	4	5625	June 11, 2009
Performance gap for a short test code between GPU and CPU CUDA Programming and Performance	8	2011	October 26, 2017
How to write kernels when there are interloop dependencies? CUDA Programming and Performance	1	414	October 18, 2019
Inter-Block Dependency CUDA Programming and Performance	13	12165	January 9, 2011
GPU-enabling a loop Legacy PGI Compilers	3	3419	October 4, 2010
one dimensional circular spin program CUDA Programming and Performance	6	3197	October 16, 2007
Is this code parallelize ? CUDA Programming and Performance	2	976	June 17, 2009

Interthreaded communication using cuda

Related topics