Making Function Calls within Accelerator Code Blocks

I have read that function calls cannot be made within the pragmas of code blocks to be accelerated. Is there a way around this besides in-lining all of the function calls? Or is this something that will be available with future releases? I know that OpenMP allows function calls within the code blocks to be accelerated and would like to do the same with PGI Accelerator.

Is there a way around this besides in-lining all of the function calls?

No, though the compiler is able to perform automatic inlining (see -Minline/-Mextract, -Mautoinline, and -Mipa=inline). It doesn’t work in all cases, but worth a try before hand inlining the rountines.

I should note that this not a PGI limitation, rather a general limitation with NVIDIA. CUDA C and CUDA Fortran appear to allow calls, but in reality all calls get inlined.

Or is this something that will be available with future releases?

Possible, but there are number of technical challenges that need to be first overcome. The first being a lack of a linker for device code. Without a linker there isn’t a way to associate symbols. Second, is the lack of context switches and software stack during runtime. Though NVIDIA has added better support for this. Third, we need to way to ensure that the function being called has a device version. There are most likely more, but these are the ones that come to mind.

We definitely have the desire to be able to allow function calls within acc compute regions. It is one the major limitations of the model and one of the most requested features.

Thanks for your interest,
Mat