I am trying to allocate a array of a derived type before passing it to my GPU kernel. I have type A defined as a host variable and type B as a device variable of dimension 2. If I use B(:)=A, my code compiles without problem, but when I run it, it crashes before executing the first line of code. I get a window that pops up saying the code has stopped working. If I run the code where type B is just a “scalar” type, then it works fine. I have also tried first making a type C that is also dimension 2 and allocated that by C(:)=A and then B=C. This gives me the same result. Is there a way to allocate an array of type B so all elements are the same as type A? If not, is there a way to pass type B as a local variable into each thread so each thread can execute using the type without interfering with the other threads?
Hi msgross42,
I’ll need a reproducing example because “B(:)=A” should work. If it’s too big to post, please send to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me.
Thanks,
Mat
Here is a test code that I made which is exhibiting the same problem.
Module Gpumodule
use cudafor
implicit none
real8,constant :: dt
real8,constant :: tfinal
real8,constant :: a
real8,constant :: b
type InitialConditions
real8 :: x1(10) = 0.0
real8 :: x2(10) =0.0
real8 :: x3(10)= 0.0
end type InitialConditions
type Body
real8 :: State(3) = 0.0
real*8 :: Statedot(3) = 0.0
type(InitialConditions) IC
end type Body
contains
attributes(global) subroutine simulation(BD,statefinal)
integer i,j,k,e,indx,npts
real8 sum,nominaltime,nominalstate(3),rkalfa(4),krkbody(3,4),statefinal(10,),time
type(Body)::BD(:)
indx=(blockidx%x-1) * blockdim%x + threadidx%x
! Define Constants
rkalfa(1) = 1.0; rkalfa(2) = 2.0; rkalfa(3) = 2.0; rkalfa(4) = 1.0
! Initial State Vector
BD(indx)%State(1)=BD(indx)%IC%x1(indx)
BD(indx)%State(2)=BD(indx)%IC%x2(indx)
BD(indx)%State(3)=BD(indx)%IC%x3(indx)
time=0
! Integrate Equations of Motion
npts = nint(tfinal/dt)
do i=1,npts
! Store Nominal State Values
nominaltime = time
nominalstate = BD(indx)%State
! Numerical Integration of Equations of Motion
do j=1,4
! State Values to Evaluate Derivatives
if (j .ne. 1) then
time = nominaltime + dt/rkalfa(j)
do k=1,3
BD(indx)%State(k) = nominalstate(k) + krkbody(k,j-1)/rkalfa(j)
end do
end if
! Compute Derivatives
call deriv(BD(indx))
do k=1,3
krkbody(k,j) = dt*BD(indx)%Statedot(k)
end do
end do
! Step Time
time = nominaltime + dt
! Step States
do j=1,3
sum = 0.0
do k=1,4
sum = sum + rkalfa(k)*krkbody(j,k)
end do
BD(indx)%State(j) = nominalstate(j) + sum/6.0
end do
end do
statefinal(indx,1)=BD(indx)%State(1)
statefinal(indx,2)=BD(indx)%State(2)
statefinal(indx,3)=BD(indx)%State(3)
end subroutine simulation
attributes(device) subroutine deriv(BD)
type(Body) BD
BD%Statedot(1)=BD%State(2)
BD%Statedot(2)=BD%State(3)
BD%Statedot(3)=BD%State(1)*a+BD%State(2)*b
return
end subroutine deriv
end Module GPUmodule
Program Main
use cudafor
use GPUmodule
implicit none
type(Body)::BDH
type(Body),device::BDD(10)
real8 statefinalh(10,3)
real8,device :: statefinald(10,3)
integer i,istat
write(,) ‘start’
a=2
b=5
dt=0.01
tfinal=5
do i=1,10
BDH%IC%x1(i)=i
BDH%IC%x2(i)=i
BDH%IC%x3(i)=i
end do
BDD(:)=BDH
call simulation<<<1,10>>>(BDD,statefinald)
istat=cudaDeviceSynchronize()
statefinalh=statefinald
do i=1,10
write(,) statefinalh(i,:)
end do
end Program Main
Ok, I thought meant “B” was just a two element array and “A” was a scalar.
BDD(:)=BDH
A UDT to UDT requires a deep copy. The error is because BDD’s address need to be deferenced and since it’s a device pointer, a segv occurs.
Though in looking at your code, I’m wondering if these UDT’s are necessary. It seems you just need to pass in the initial conditions and then use local scalars to hold the thread’s states. Using local scalars will help performance since these can be held in the register file rather then being fetch from global memory.
- Mat
The reason it is set up like this is to try and mimic how my actual code is set up. I am trying to adapt an existing code to be used on a GPU and I am trying to limit the number of significant modifications. Is there a way to define another derived type within the GPU kernel that exists only in local memory as opposed to continually referencing the type that is passed into global memory in the function call?
Only UDT’s that have only fundamental data types are supported. So you could change the arrays to be multiple scalars, but at that point you might as well just uses scalars in your kernel.
- Mat