Compiler failed to translate accelerator region

Hello,

I am getting the following error: PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported procedure

I dont quite understand why this is happening, except the fortran function allocate and deallocate I do not have any other functions that can cause any problems as the other functions are working in the previous routines.

I am using the fortran allocate to allocate a derived type which looks like this:

  TYPE(tri_diag_mat)       ,device,allocatable ,DIMENSION(:)       :: s

where the

TYPE(tri_diag_mat)

is

TYPE tri_diag_mat
     REAL(KIND=8), POINTER, DIMENSION(:) :: a, b, c !==Lower, diagonal, upper
  END TYPE tri_diag_mat

Therefore I am allocating s of type tri_diag_mat to a dimension of 3 and also allocating its constituents a,b,c in the device code.

Is this allowed ? I am using PGI-16.10-0 version for Linux x86/64

Thanks!

Hi prattvn,

Can you post a reproducing example?

It’s possible that the “Unsupported procedure” error is a compiler generated call to runtime routine, such as creating a temp array, but I’ll need a more complete example to know for sure.

  • Mat

Hello,

I have sent the source code to trs@pgroup.com and the code has been received by Dave from PGI. Could you please check the code and let me know the error ? Because it would be difficult for me post a reproducing example here.

Thanks!

Pratt

Hi Prattvn,

I took a look at the source you sent and was able to recreate the error. Unfortunately, we don’t have the complete Fortran runtime over on the device and in this case missing “pgf90_calloc04”. I’ve added a problem report (TPR#23339) and sent it to engineering.

Note that while allocation from within device code is permitted, I high encourage users to avoid it. While allocating, the threads are serialized thus causing a severe performance impact. Also, the default device heap size in only 8MB. While you can increase this to 32MB, it’s still very small.

To work around this error and to help the performance of your code, I’d recommend adding an extra dimension to “s”, with one element per thread, perform the allocation from the host, and then pass the structure as a kernel argument.

Best Regards,
Mat