I am trying to allocate a array of a derived type before passing it to my GPU kernel. I have type A defined as a host variable and type B as a device variable of dimension 2. If I use B(:)=A, my code compiles without problem, but when I run it, it crashes before executing the first line of code. I get a window that pops up saying the code has stopped working. If I run the code where type B is just a “scalar” type, then it works fine. I have also tried first making a type C that is also dimension 2 and allocated that by C(:)=A and then B=C. This gives me the same result. Is there a way to allocate an array of type B so all elements are the same as type A? If not, is there a way to pass type B as a local variable into each thread so each thread can execute using the type without interfering with the other threads?
I’ll need a reproducing example because “B(:)=A” should work. If it’s too big to post, please send to PGI Customer Service (firstname.lastname@example.org) and ask them to forward it to me.
Here is a test code that I made which is exhibiting the same problem.
real8,constant :: dt
real8,constant :: tfinal
real8,constant :: a
real8,constant :: b
real8 :: x1(10) = 0.0
real8 :: x2(10) =0.0
real8 :: x3(10)= 0.0
end type InitialConditions
real8 :: State(3) = 0.0
real*8 :: Statedot(3) = 0.0
end type Body
attributes(global) subroutine simulation(BD,statefinal)
indx=(blockidx%x-1) * blockdim%x + threadidx%x
! Define Constants
rkalfa(1) = 1.0; rkalfa(2) = 2.0; rkalfa(3) = 2.0; rkalfa(4) = 1.0
! Initial State Vector
! Integrate Equations of Motion
npts = nint(tfinal/dt)
! Store Nominal State Values
nominaltime = time
nominalstate = BD(indx)%State
! Numerical Integration of Equations of Motion
! State Values to Evaluate Derivatives
if (j .ne. 1) then
time = nominaltime + dt/rkalfa(j)
BD(indx)%State(k) = nominalstate(k) + krkbody(k,j-1)/rkalfa(j)
! Compute Derivatives
krkbody(k,j) = dt*BD(indx)%Statedot(k)
! Step Time
time = nominaltime + dt
! Step States
sum = 0.0
sum = sum + rkalfa(k)*krkbody(j,k)
BD(indx)%State(j) = nominalstate(j) + sum/6.0
end subroutine simulation
attributes(device) subroutine deriv(BD)
end subroutine deriv
end Module GPUmodule
real8,device :: statefinald(10,3)
end Program Main
Ok, I thought meant “B” was just a two element array and “A” was a scalar.
A UDT to UDT requires a deep copy. The error is because BDD’s address need to be deferenced and since it’s a device pointer, a segv occurs.
Though in looking at your code, I’m wondering if these UDT’s are necessary. It seems you just need to pass in the initial conditions and then use local scalars to hold the thread’s states. Using local scalars will help performance since these can be held in the register file rather then being fetch from global memory.
The reason it is set up like this is to try and mimic how my actual code is set up. I am trying to adapt an existing code to be used on a GPU and I am trying to limit the number of significant modifications. Is there a way to define another derived type within the GPU kernel that exists only in local memory as opposed to continually referencing the type that is passed into global memory in the function call?
Only UDT’s that have only fundamental data types are supported. So you could change the arrays to be multiple scalars, but at that point you might as well just uses scalars in your kernel.