Recursive Algorithm

Is there a way to exploit the GPU for simple recursive code such as:

//CPU CODE

//some inputs: a,b,c
ex_r = a;
expd_r = b;
expdd_r = c;

//Initial values
tr = exd_r*exdd_r ;
exd_r = tr;

tr = exdd_r*exdd_ri;
expdd_r = tr;

for (i=0; i<size; i++) {
R[i] = ex_r;

tr = ex_r*exd_r ;
ex_r = tr;

tr = exd_r*exdd_r ;
exd_r = tr;

}

i.e. each subsequent result depends on the previous result. I am pretty new to CUDA so apologizes if this has been hashed out already.

Thanks!

Cleaning this to only one thread.

What you’re looking for is called scan or parallel prefix sum. There are some samples in the CUDA SDK that pertain to this as well as implementations in libraries like CUDPP.