Variable Attributes

I am trying to implement an algorithm whose pseudo code looks like below. What options are there for the attributes of the variables constvars, derivedvals, furtherderivedvals, args and which would work best?

I tried pinned, but apparently, pinned variables MUST be of allocatable type

module const
integer :: constvars
!(could as well be real/complex type)
.
.
end module const


module derivednum
use const

integer :: derivedvals
!(could as well be real/complex type)
.
.
end module derivednum

module gpuparsection
use const
use derivednum

attributes(global) subroutine dev_kernel( args )
_calculations depend on_ 
constvars
derivedvals
furtherderivedvals

end subroutine dev_kernel

subroutine callingroutine
integer :: furtherderivedvals, args
!(could as well be real/complex type)
.
.
call dev_kernel <<<  >>> ( args )
end subroutine callingroutine

end module gpuparsection

program main
use const
use derivednum
use gpuparsection

call callingroutine

end program main

__

Hi adityaks,

Constant module variables would have the “constant” attribute while device variables would have the “device” attribute. The main difference being that “constant” variables are read/write from the host but read-only on the device, while “device” variables are read/write from both. Also, “constant” variables are stored in a fast access memory area.

Note that I would recommend to avoid use of module device scalars. If they are read-only then put them in constant memory. If you do need write to them from the device, then you’ll need to worry about synchronization.

By “furtherderivedvals” I’m assuming you mean local device variables? In this case you can add the “device” attribute, but it’s implied for all variables declared in device code so is unnecessary.

On second glance, I see that you have some “furtherderivedvals” in the host calling routine and want to access them in the device code. This wont work due to scoping. They need to be passed as an argument or be put in a module.

For args, you can only pass in “device” variables or host scalars by value (i.e. add the “value” attribute to the local device declaration of the scalar variable).

The “pinned” attribute is for host allocatable arrays. You still need to create a corresponding device array and copy the host array to the device array before it can be used in device code. The “pinned” attribute simply requests that the host memory be pinned in physical memory and not swapped out to virtual memory. This saves an extra host-to-host copy.

Hope this helps,
Mat

Hi Mat,

Thanks for the explanations.

Went through your answer and prepared a 2nd iteration. I didn’t get the terminology when you say

module device scalars> . If they are read-only then put them in constant memory. If you do need write to them from the device, then you’ll need to worry about synchronization.

Can you point out which part of the 1st iteration (or the 2nd below, whichever) relates to module device scalars. By scalars, you mean a single value variable such as constvalN and derivedvalN (in contrast to arrays and derived data types) ?

The 2nd iteration is below. Please suggest if the attributes and assignments are proper.

! This is a pseudo code which resembles the fortran implementation of an algorithm.
! This is to understand CUDA functionalities and develop a CUDA Fortran version of the code.

! val = func(args) is used in the mathematical sense. 
! Could be anything from a binary operation among args to a function/subroutine call that will set the value for val.

! N is used in algebraic sense. N can be any positive integer. So constvarN could be constvar1/constvar34..

module const
  integer,constant :: constvar1, constvar2, constvarN
  !(could as well be real/complex type in the rest of this code)
  !.
  !.
end module const

module derivednum
  use const

  integer :: derivedval1, derivedval2, derivedvalN
  integer, dimension(constvarN) :: derivedarrN
  derivedval1 = func ( .. ,constvarN,..)
  derivedarrN = func(..,constvarN,..)
  !.
  !.
end module derivednum

module gpuparsection
  use const
  use derivednum
  integer :: furtherderivedvals, 
  integer,device :: devderivedvalN
  integer,device,dimension(constvarN) :: devderivedarrN

  attributes(global) subroutine dev_kernel( args, devderivedvalN, devderivedarrN  )
    ! kernel has 3-6 level of not-tightly-nested loops
    !_at different stages, calculations depend on_
    constvar1, constvar2..constvarN
    devderivedvalN
    devderivedarrN
    furtherderivedvals

  end subroutine dev_kernel

  subroutine callingroutine
    integer :: args
    !.
    !.
    derivedval2 = func (constvar1, .. constvarN)
    devderivedval2 = derivedval2

    devderivedarrN = derivedarrN
    call dev_kernel <<<  >>> ( args, devderivedvalN, devderivedarrN )
  end subroutine callingroutine

end module gpuparsection

program main
  use const
  use derivednum
  use gpuparsection

  call callingroutine

end program main

In particular, as a first issue subroutine dev_kernel, needs to make use of a number of scalars (in the way I have understood and mentioned earlier in this post), arrays etc ( constvar1, constvar2…constvarN
devderivedvalN
devderivedarrN
furtherderivedvals
)
In fact, our code will have a couple of tens of constvar, derivedval and derivedarr. Copying them to device-attributed-variables and/or passing them as arguments to subroutine dev_kernel will be lengthy and, more importantly, untidy. And I suspect such discrete copies would make the overall runtime slower than a non-CUDA code!

As a second issue, the prospective subroutine dev_kernel will have upto 6 or 8 loosely nested do loops. That is, many scalars and some arrays need to be calculated before the next level of do loop starts, which will then need those values. This being the scenario, I am again worried about the speed up of the CUDA version of the code against the original fortran one.

So I am wondering what is your/PGI’s take on these two issues.

  • Aditya

. By scalars, you mean a single value variable such as constvalN and derivedvalN (in contrast to arrays and derived data types) ?

Correct.

Can you point out which part of the 1st iteration (or the 2nd below, whichever) relates to module device scalars

Since “derivedvals” is a scalar integer defined in the data declaration portion of a module (i.e. a module device scalar), I don’t recommend making it a device variable.

module derivednum
use const

integer :: derivedvals  
!(could as well be real/complex type)
.
.
end module derivednum



In fact, our code will have a couple of tens of constvar, derivedval and derivedarr. Copying them to device-attributed-variables and/or passing them as arguments to subroutine dev_kernel will be lengthy and, more importantly, untidy. And I suspect such discrete copies would make the overall runtime slower than a non-CUDA code!

No need to pass them if they are module variables. You do need to copy the data to/from the device at some point but that’s a separate issue and I don’t see anything in your code indicating where you’re performing the copy.


As a second issue, the prospective subroutine dev_kernel will have upto 6 or 8 loosely nested do loops. That is, many scalars and some arrays need to be calculated before the next level of do loop starts, which will then need those values. This being the scenario, I am again worried about the speed up of the CUDA version of the code against the original fortran one.

Having a lot of local variables will increase the number of registers used per thread, thus lowering the occupancy and performance. However, so long as you have enough parallelism and miniimize data movement, you still can see speed-up over the CPU.

  • Mat