Parallelizing for loops using CUDA

CUDA_H.264 · March 7, 2012, 3:33pm

Hi,

I have a for loop which takes around 16 ms to execute and it is executed conditionally under another for loop for 500 times.

Serial code format is like this:

//Outer for loop
for(i=0;i<500;i++){

//some conditions

// some function calls
// some nested function calls
// inner for loop
for (j=0;some condition;j++){

}

I want to parallelize the inner for loop.
Is it possible by CUDA programming to reduce the time required to execute inner for loop by 40% and hence the total time required to run the serial code?

Please help.

Thanks!

pasoleatis · March 7, 2012, 7:45pm

Hello,

YOu need to get more details about the inner loop. First what is “some condition”, second is the instruction at j of the instruction at j’? Third what is before and after the inner loop. How often will be required to copy the data fro cpu to gpu and back?

CUDA_H.264 · March 7, 2012, 8:53pm

yup. I’ll find that out. But, my basic question is for prallelising a loop which takes ~16ms time to execute and hence reducing the overall time required to execute the outer for loop, is CUDA a good solution?

cbuchner1 · March 8, 2012, 12:28pm

The answer is a clear “maybe”.

It all depends on what is happening in the inner loop.