How to use arrays of arrays in structures, in the device?

With CUDA, I’m trying to allocate arrays in a structure, because I want to use this structure in my kernels. So here is a short code (stored in a file called struct.cuf) that describe my problem. I’m compiling with the PGI 16.10 version, and I’m using the following options : -O3 -Mcuda=cc5x struct.cuf -o struct_out

module structure

type mytype
 integer :: alpha,beta,gamma
 real,dimension(:),pointer :: a
end type mytype

type mytypeDevice
 integer :: alpha,beta,gamma
 real,dimension(:),pointer,device :: a
end type mytypeDevice

end module structure

program main
 use cudafor
 use structure

 type(mytype) :: T(3)
 type(mytypeDevice) :: T_Device(3)

 ! For the host
 do i=1,3
 end do
 T(1)%a=1; T(2)%a=2; T(3)%a=3

 ! For the device
 do i=1,3
 end do
 do i=1,3
 end do

end program main

This code works, but I can’t use the structure T_Device in my kernels, because it needs the device attribute in the kernel variables declaration. I get this error :

PGF90-S-0528-Argument number 1 to mykernel: device attribute mismatch

So is there an efficient way to use structures in the device ? I would like to use the a arrays declared in mytype in the device, to use them in my kernels.

Thanks !

Once I added a line to the top of your code

module structure

and compile

pgfortran -o struct struct.f90 -Mcuda=cc5x -O3

the code compiled without incident.

I did this with 16.10 and 17.5.


UDT are really hard to share between host and device unless you declare everything as managed. Is that an option?

jtull > Oups yes I copy/paste my code and I forgot this line, edited thanks. It works but I really need T_Device(:)%a(:) in my kernels.

brentl > Is it really difficult ? Unfortunately, I can’t use the managed attributes since my GPU have a 5.2 compute capability. It would be great to have an example on how to deal with these arrays of arrays.

Here’s a link to an article I wrote a couple of years back:

The way you have it now, the top half of your derived types resides on the host, and the bottom half resides on the device. So, you are limited in what you can do in either host or device code. That’s why managed is such a help in this case.

If you can limit the usage of the data on either host (just memcpy type operations) or device (pass the underlying arrays, not the top level structure) you can probably get it to work.

Well that’s a really usefull article, thanks !
Since I haven’t a 6.0 + compute capability, I’m using arrays to stock the infos of the structures in the host, it still works but it’s very heavy. Seems I have to upgrade with a Tesla P100 to enjoy these managed features.

You need CUDA 6.0 (the release), not a compute capability CC60 card. I have a Kepler on my machine here, CC35, and it supports managed memory. If you run pgaccelinfo, you might see this line:

Managed Memory: Yes

In my version, two lines from the bottom.

Ahhhh you’re right, it works with my CC5x card ! I misunderstood, thank you very much !

I have one more question : I have a code working without managed memory on a Tesla P100 and a Quadro M2000.

Now I just modified this code using managed memory for my structures (it is very efficient by the way, it cleans the code greatly). I run it on the Tesla P100 with pgfortran, it works perfectly well. But on the Quadro M2000, I compiled with pgf90 and when I run it, it stops and give me this error :

 Bus error (core dumped)

Why my code can’t run on the Quadro M2000 ? It seems to be a problem with managed memory because without using it, my code works with the Quadro M2000