Me and my colleagues have a simulation program that requires much process power and that must be run each time for each unit of the whole simulation. A complete simulation has about a million units, i.e, the program runs about a million times.
Recently we thought about moving this project to the GPU world, beginning with CUDA. But as the program is really big, we were wondering if (not as a final solution, of course) there is the possibility of sending the code to the GPU for execution in every core. The code is in Fortran and takes about 2h in an average CPU to run each time.
That being said, I would like to know if anyone is available to help me find some references that can give me information about the possibility (or impossibility) of doing this.
Your best option probably is PGI’s Accelerator Fortran, which brings Fortran code to the GPU by inserting compiler directives into the code.
PGI also offers a CUDA Fortran compiler which is analogous to Nvidia’s CUDA C compiler (and made Nvidia rename CUDA to CUDA C).
In principle you could also use a Fortran to C converter like f2c and then rewrite the result to CUDA C. But this is tedious, requires knowledge of all of Fortran, C, f2c, and CUDA, and generates ugly and hard to read code.
I had heard of the PGI compiler and I will learn more about it.
Nevertheless, I think I should rephrase my question to a more conceptual one: is it possible to have a code (written in C or Fortran), make it run in every core of the GPU and then get its results without having to alter it?