Parallelisim

cmorrison · August 16, 2007, 2:05pm

Ok I think this is impossible but here goes.

I have the following code

for (x=0; x< 8; x++);

{

...

  for (z=0; z<1000; z++);

   {

  ...

   }

...

}

Is it possible for me to parallelise both the inner and the outer loops here? At the moment I am calling the GPU to run the inner loop and just using the CPU for the outer loop but is it possible to use the GPU for both? With these figure i’m guessing my way is quicker but if it was x<1000 and z<1000 that probably wouldn’t be the case.

Cheers,

Chris

SrJsignal · August 16, 2007, 4:11pm

I think it really depends on what processing you’re doing in the loops. I’ve just implemented some functions that have similar access patterns when done in pure C. I basically just converted the entire loop structure to a single kernel with a 1/2 way educated access structure and it’s super fast. There wasn’t really any dependency at all on previous calculations on the data however.

If the actions of the loop depend a lot on previous actions of the loop, it gets tougher to do that.

Mark_Harris · August 23, 2007, 1:00pm

If all iterations of the loops are independent, then you can use a 2D thread block to run this, no problem. If there are dependencies, as stated by SrJsignal, then things get trickier, but not impossible, depending on the dependencies.

Mark

cmorrison · August 24, 2007, 11:39am

I’ve pretty much solved this now.

I’ve managed to unroll both the loops and remove the dependencies giving me

for (i = 0; i < (8*1000); i++){

some work

}

I’m working with three loops of over 1000000 now (instead of 3 lots of 2 nested loops of 1000) allowing me to call three separate CUDA functions that have really high levels of parallelism.

I’m hoping for a huge performance advantage over CPU, but I’m just waiting to run out of memory…

Chris

Topic		Replies	Views
Parallelizing for loops using CUDA CUDA Programming and Performance	3	2649	March 8, 2012
Interthreaded communication using cuda CUDA Programming and Performance	4	940	September 2, 2015
nested parallelism CUDA Programming and Performance	1	4381	January 15, 2009
3D Block and Grid CUDA Programming and Performance	1	1844	April 25, 2012
Possible to paralellize dependent forloop in cuda? Each iteration has to occur in the order CUDA Programming and Performance	1	4045	June 16, 2008
3-layer for-loops CUDA Programming and Performance	5	2215	December 3, 2008
How to parallel the outer loop Legacy PGI Compilers	3	2202	October 10, 2012
Loop inside kernel or over kernels in host code? [performance question] CUDA Programming and Performance	8	6833	September 25, 2008
Kernel call preforming calculation right? CUDA Programming and Performance	0	1479	July 24, 2009
Is this short Java lopp parallelizable? CUDA Programming and Performance	4	1541	April 2, 2009

Parallelisim

Related topics