Using the value attribute in a module and automatic arrays

Hi there,

I’m trying to port a rather large code to GPU but I’m running in to some difficulties.

(1) Is it allowed to use the [/b]value attribute in module scope. By doing this does the compiler automatically know that the variable is a device variable, as the compiler throws up errors if I put both the device and the value**?

(2) 5 of the arrays I’m trying to copy over to the device are automatic arrays (their size is determined by an adjustable variable, dependent on the data inputted in other parts of the program), which is not allowed.

Would it be possible to get around this by allocating the array size, or is this still classed as being automatic? As I’m trying to save memory usage on the device I’d rather use a more efficient method than just setting a huge arrays.

Any suggestions or comments would be much appreciated.

Cheers,
Crip_crop**

Hi Crip_crop,

(1) Is it allowed to use the [/b]value attribute in module scope. By doing this does the compiler automatically know that the variable is a device variable, as the compiler throws up errors if I put both the device and the value> ?

I"value" indicates that a scalar argument is to be passed by value into to subroutine and used to initialize a local device scalar. “device” when applied to an argument means that you’re passing in by reference a pointer to a scalar in global device memory that is shared by all threads. Hence the two are not compatable.

Would it be possible to get around this by allocating the array size, or is this still classed as being automatic? >

No. Currently NVIDIA devices do not allow memory allocation from device kernels. You must use fixed sized arrays in your kernels or allocate device data from the host.

Maybe you can rework your algorithms so that your threads can share the arrays and hence make them allocatable device arrays allocated from the host before you launch your kernel.

Alternatively, you may be able to use assumed-size shared arrays where the size is the third argument of your kernel configuration. Though, you are limited by the size of the shared memory.

Hope this helps,
Mat



Hope this helps,
Mat

That’s really helpful, thanks.

Are their any performance implications from using the value attribute? Or is it in fact quicker because the scalar variable is stored in local memory. Also, is there a limit to the number of scalars passed by value?

Cheers,
Crip_crop

Another issue with “value”…

I seem to be having a problem with giving variables the “value” attribute in module scope. Is this allowed?

Cheers,
Crip_crop

Hi Crip Crop,

I seem to be having a problem with giving variables the “value” attribute in module scope. Is this allowed?

The “value” attribute is only allowed on scalar dummy arguments.

Are their any performance implications from using the value attribute?

By default, Fortran passes arguments by reference (i.e. an address in memory). The Fortran 2003 value attribute can be used to over ride this default by passing the argument’s value and initializing a local variable to this value.

In CUDA Fortran, the ‘value’ attribute allows you to use host scalar variables as arguments. Without ‘value’, you would be passing in an address in host memory or need to create a variable in device memory to store the value before passing it in.

As for performance, in general it’s better to use local kernel variables rather than global variables since they will be more likely stored in a register. Though, you are limited in the number of registers available so it’s best to not use too many local variables else they’ll ‘spill’ to global memory.

Also, is there a limit to the number of scalars passed by value?

In CUDA Fortran, there is not a limit on the number of variables that can be passed to a kernel.

Hope this helps,
Mat