I would like to speed-up my code using a Tesla card. So far I use OpenMP in Fortran to parallelize my code across N CPU cores. In addition I would like to add an extra layer of parallel computing using one Tesla graphic card. In particular I would like to outsource one function to the GPU within my loop that parallelizes over the CPU cores. Does that sound feasible? My understanding is that I can either parallelize on the CPU and after these computations I can call one kernel that parallelizes on the GPU. Therefore I am not sure whether using multiples CPU cores and one Tesla card I should be able to parallelize code on the CPU and GPU in parallel (rather than sequentially). Or is the compiler efficiently spreading computations across CPU and GPU cores? I’d also be very grateful if you could point me to an example (if any).
Thank you very much.