Hello, I’m a CUDA newbie trying to decide if switching from a cluster to CUDA would work for my application. I used MPI to parallelize a Fortran 90 code to run on a Beowulf cluster by taking advantage of task parallelism. Each task assigned to a core involves numerical integration of a function over the surface of an element in 3D space, incrementing the Gauss rule, then evaluating the integral again and checking the change against a tolerance. The evaluation of the functions to be integrated is itself an adaptive procedure and not expressible in closed-form, leading to loop-carried dependencies. Is this program not a good candidate for a Tesla supercomputer with Portland’s CUDA Fortran compiler?
By the way, this is for a boundary element code that requires complex-double precision.
Thanks for your help.