CUDA for task parallelism?

Hello, I’m a CUDA newbie trying to decide if switching from a cluster to CUDA would work for my application. I used MPI to parallelize a Fortran 90 code to run on a Beowulf cluster by taking advantage of task parallelism. Each task assigned to a core involves numerical integration of a function over the surface of an element in 3D space, incrementing the Gauss rule, then evaluating the integral again and checking the change against a tolerance. The evaluation of the functions to be integrated is itself an adaptive procedure and not expressible in closed-form, leading to loop-carried dependencies. Is this program not a good candidate for a Tesla supercomputer with Portland’s CUDA Fortran compiler?

By the way, this is for a boundary element code that requires complex-double precision.

Thanks for your help.

I’ve never seen an integral with an integrand with no known closed form. I guess an example is if you want to integrate the fibonacci function (even though fibonacci does have a closed form)

If that really is the case, then you’ll really have limited ||ism.

If the surface is fixed, you could compute the integrand once and reuse it, allowing massive ||ism.

If indeed you’re limited to evaluating 1 surface per thread, you would need thousands of surfaces to keep the GPU busy.

Without seeing exactly what you’re computing, I can’t really see a solution.