copying arrays of structures to GPU causes Loworing error

Hello,

First, when, using cuda fortran, it seems it is not possible to copy arrays of structures from CPU to GPU using an assignment. Why is that? But at least, in this case, the compiler message is quite explicit:

PGF90-S-0099-Illegal use of derived type (tr_ly_utils.cuf: 93)

Thus, I tried to copy field by field. Something like:

cell_gpu(:,:,:)%luminosity=cell(:,:,:)%luminosity

compiles (to early to check if it runs correctly)
But, if I try:

cell_gpu(:,:,:)%vel(1)=cell(:,:,:)%vel(1)

then I get:

Lowering Error: bad ast optype in expression [ast=1664,asttype=12,datatype=0]
Lowering Error: bad ast optype in expression [ast=1664,asttype=12,datatype=0]
PGF90-F-0000-Internal compiler error. Errors in Lowering 2 (tr_ly_utils.cuf: 257)
PGF90/x86-64 Linux 11.1-0: compilation aborted

So the question, beyond the “non-explicitness” of the error message, is there
a way to make this copy without making an explicit loop on the indexes of the array of structures?

Benoit.

Hi benoit,

Lowering errors are always problems with the compiler. Can you please send a reproducing example to PGI Customer Service (trs@pgroup.com)?

Note that the copying a single member will most likely result in poor copy performance. Since the data isn’t contiguous, it can’t be copied in large blocks.

Thanks,
Mat

Here it is. Quite short:

PROGRAM test
USE CUDAFOR

TYPE :: CELL_STRUCT
  REAL(KIND=4),dimension(3) :: vel
  REAL(KIND=4) :: HI_number_density
  REAL(KIND=4) :: luminosity
  REAL(KIND=4) :: T4
END TYPE


TYPE (CELL_STRUCT), dimension(256,256,256) :: CELL
TYPE (CELL_STRUCT), device, dimension(256,256,256) :: CELL_dev


cell_dev(:,:,:)%vel(1)=cell(:,:,:)%vel(1)


END PROGRAM

I compile it with :

/opt/pgi/linux86/11.1/bin/pgf90 -Mcuda=cc13 test.cuf

And get the following message:

Lowering Error: bad ast optype in expression [ast=696,asttype=12,datatype=0]
Lowering Error: bad ast optype in expression [ast=696,asttype=12,datatype=0]
PGF90-F-0000-Internal compiler error. Errors in Lowering 2 (test.cuf: 19)
PGF90/x86 Linux 11.1-0: compilation aborted

And if I replace:


cell_dev(:,:,:)%vel(1)=cell(:,:,:)%vel(1)

by

cell_dev%vel(1)=cell%vel(1) (which may not be a proper fortran code line)

I get also a non-explicit message:

/tmp/pgf90MJobYoJgNRsb.s: Assembler messages:
/tmp/pgf90MJobYoJgNRsb.s:104: Error: suffix or operands invalid for `movss’

I’m sending a copy of this to the email address right now.

Do you have any suggestion on how to move, efficiently arrays of structure from cpu to gpu?

Benoit.

Hi Benoit,

Thanks for the example. I have sent a report to our engineers (TPR#17637) and hopefully they can have it fixed soon.

Do you have any suggestion on how to move, efficiently arrays of structure from cpu to gpu?

The most efficient way to copy data to/from a GPU is in large contiguous blocks. So copying the entire structure “cell_dev=cell” would be the most efficient way.

If you were only going to use a small portion of the CELL data, it might be better to use host side temp arrays to hold the data and then it copy over. For example, if you only ever use the first element of ‘vel’, then copy it over to an array of reals, and then copy the data to the device.

Something like:

PROGRAM test
USE CUDAFOR

TYPE :: CELL_STRUCT
  REAL(KIND=4),dimension(3) :: vel
  REAL(KIND=4) :: HI_number_density
  REAL(KIND=4) :: luminosity
  REAL(KIND=4) :: T4
END TYPE

TYPE (CELL_STRUCT), dimension(256,256,256) :: CELL
REAL(KIND=4), dimension(256,256,256) :: CELL_vel
REAL(KIND=4), device, dimension(256,256,256) :: CELL_dev

CELL_vel(:,:,:) = CELL(:,:,:)%vel(1)
cell_dev=cell_vel


END PROGRAM
  • Mat

Hi Mat,

Actually want I want to do is simply copy the whole array to the GPU. However
when I first tried I got the message:

GF90-S-0099-Illegal use of derived type

So I assumed it was not possible. But since you seemed to imply that it was indeed possible I investigated a bit further. The problem was that I defined two
different structure types, one in a cpu variable module, and one in the gpu variable module, that are identical but have different names. (Compiling for
1.3 capability, there are some restriction in using just any module in a kernel…)

I solved the problem by defining a single structure type in a module which contains nothing else and including it in both the cpu variable module and in the gpu variable module.

Benoit,

TPR 17637 is reported as fixed in the current 11.4 release.

Thanks again for the report.

dave