Derived Type Array Problem

msgross42 · March 6, 2013, 6:34pm

I am trying to allocate a array of a derived type before passing it to my GPU kernel. I have type A defined as a host variable and type B as a device variable of dimension 2. If I use B(:)=A, my code compiles without problem, but when I run it, it crashes before executing the first line of code. I get a window that pops up saying the code has stopped working. If I run the code where type B is just a “scalar” type, then it works fine. I have also tried first making a type C that is also dimension 2 and allocated that by C(:)=A and then B=C. This gives me the same result. Is there a way to allocate an array of type B so all elements are the same as type A? If not, is there a way to pass type B as a local variable into each thread so each thread can execute using the type without interfering with the other threads?

MatColgrove · March 6, 2013, 7:14pm

Hi msgross42,

I’ll need a reproducing example because “B(:)=A” should work. If it’s too big to post, please send to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me.

Thanks,
Mat

msgross42 · March 6, 2013, 7:47pm

Here is a test code that I made which is exhibiting the same problem.

Module Gpumodule
use cudafor
implicit none

real8,constant :: dt
real8,constant :: tfinal
real8,constant :: a
real8,constant :: b
type InitialConditions
real8 :: x1(10) = 0.0
real8 :: x2(10) =0.0
real8 :: x3(10)= 0.0
end type InitialConditions
type Body
real8 :: State(3) = 0.0
real*8 :: Statedot(3) = 0.0
type(InitialConditions) IC
end type Body

contains

attributes(global) subroutine simulation(BD,statefinal)
integer i,j,k,e,indx,npts
real8 sum,nominaltime,nominalstate(3),rkalfa(4),krkbody(3,4),statefinal(10,),time
type(Body)::BD(:)
indx=(blockidx%x-1) * blockdim%x + threadidx%x
! Define Constants

rkalfa(1) = 1.0; rkalfa(2) = 2.0; rkalfa(3) = 2.0; rkalfa(4) = 1.0

! Initial State Vector

BD(indx)%State(1)=BD(indx)%IC%x1(indx)
BD(indx)%State(2)=BD(indx)%IC%x2(indx)
BD(indx)%State(3)=BD(indx)%IC%x3(indx)
time=0

! Integrate Equations of Motion

npts = nint(tfinal/dt)

do i=1,npts

! Store Nominal State Values

nominaltime = time
nominalstate = BD(indx)%State

! Numerical Integration of Equations of Motion

do j=1,4

! State Values to Evaluate Derivatives

if (j .ne. 1) then
time = nominaltime + dt/rkalfa(j)
do k=1,3
BD(indx)%State(k) = nominalstate(k) + krkbody(k,j-1)/rkalfa(j)
end do
end if

! Compute Derivatives

call deriv(BD(indx))
do k=1,3
krkbody(k,j) = dt*BD(indx)%Statedot(k)
end do

end do

! Step Time

time = nominaltime + dt

! Step States

do j=1,3
sum = 0.0
do k=1,4
sum = sum + rkalfa(k)*krkbody(j,k)
end do
BD(indx)%State(j) = nominalstate(j) + sum/6.0
end do
end do
statefinal(indx,1)=BD(indx)%State(1)
statefinal(indx,2)=BD(indx)%State(2)
statefinal(indx,3)=BD(indx)%State(3)
end subroutine simulation

attributes(device) subroutine deriv(BD)
type(Body) BD
BD%Statedot(1)=BD%State(2)
BD%Statedot(2)=BD%State(3)
BD%Statedot(3)=BD%State(1)*a+BD%State(2)*b
return
end subroutine deriv
end Module GPUmodule

Program Main
use cudafor
use GPUmodule
implicit none
type(Body)::BDH
type(Body),device::BDD(10)
real8 statefinalh(10,3)
real8,device :: statefinald(10,3)
integer i,istat

write(,) ‘start’
a=2
b=5
dt=0.01
tfinal=5

do i=1,10
BDH%IC%x1(i)=i
BDH%IC%x2(i)=i
BDH%IC%x3(i)=i
end do

BDD(:)=BDH

call simulation<<<1,10>>>(BDD,statefinald)
istat=cudaDeviceSynchronize()
statefinalh=statefinald
do i=1,10
write(,) statefinalh(i,:)
end do
end Program Main

MatColgrove · March 6, 2013, 8:30pm

Ok, I thought meant “B” was just a two element array and “A” was a scalar.

BDD(:)=BDH

A UDT to UDT requires a deep copy. The error is because BDD’s address need to be deferenced and since it’s a device pointer, a segv occurs.

Though in looking at your code, I’m wondering if these UDT’s are necessary. It seems you just need to pass in the initial conditions and then use local scalars to hold the thread’s states. Using local scalars will help performance since these can be held in the register file rather then being fetch from global memory.

Mat

msgross42 · March 6, 2013, 8:38pm

The reason it is set up like this is to try and mimic how my actual code is set up. I am trying to adapt an existing code to be used on a GPU and I am trying to limit the number of significant modifications. Is there a way to define another derived type within the GPU kernel that exists only in local memory as opposed to continually referencing the type that is passed into global memory in the function call?

MatColgrove · March 8, 2013, 11:07pm

Only UDT’s that have only fundamental data types are supported. So you could change the arrays to be multiple scalars, but at that point you might as well just uses scalars in your kernel.

Mat