ALLOCATING MEMORY IN KERNEL v FROM HOST

chrismc · June 27, 2008, 9:30am

I am trying to put all a CFD simulation onto a C870 by declaring all the arrays to be used in the functions which are declared as device except for the kernel called from the host. In the kernel a number of arrays are declared which are passed down and processed by a sequence of functions. When I compiled the code I got the following error message ;

“entry function uses too much local memory”

I assume that when a function, either kernel or device , allocates memory without any location specified then the default location used is the small local memory, not the global, whereas if the same arrays are declared and created from the host then all that data has the default location in global memory, and it is then up to me to move frequently used data into shared .

So

why do I get that error message?
how can I create the arrays declared in the kernel and device functions so that I don’t get the error message?
should I simply allocate and deallocate all device memory from the host?

Simon_Green · June 27, 2008, 10:59am

Yes, having large statically allocated arrays in your kernel functions is generally not a good idea (these will be put in local memory).

You should allocate the arrays using cudaMalloc in you host code, and then pass pointers to the kernel.

chrismc · June 27, 2008, 11:23am

so when/can a variable of type device be declared and used? (see new topic entitled device variables)

MisterAnderson42 · June 27, 2008, 1:26pm

a device variable can be declared and used with the same rules as any global scope array in C. It should be declared at file scope and the size needs to be evaluated at compile time. Using cudaMalloc on the host allows you to dynamically allocate the correct amount of memory.

chrismc · June 27, 2008, 2:22pm

So regarding device variables;

they have to be declared as pointers, e.g. you cannot declare them as

type array[N];

but only as

type* array;

with a cudaMalloc allocating the array in host code and/or device code with size provided?

you can declare as

type array[N];

but in order to read the contents of array e.g. to printf, then you must use GetSymbolAddress? How is that done?

what advantages/disadvantages are there to using device variables compared to the standard cudaMalloc and passing pointers to a kernel?

Thanks in advance.

Chris

MisterAnderson42 · June 27, 2008, 2:57pm

Well, you can do either I suppose. But with the *, you will have to cudaMalloc and then copy the pointer over to the device making even more of a headache. It is easier just to allocate with cudaMalloc and pass the device pointer to the kernel as a parameter.

Example of a device array:

__device__ int d_array[2000];

__global__ void kernel()

    {

    int a = d_array[threadIdx.x];

    // ...

    }

The device array is in the GPU’s device memory. If you try to dereference the device pointer on the host you will segfault or run into other weird problems associated with reading random memory.

To copy the contents of the device array to the host, you have to use cudaGetSymbolAddress to get the device memory pointer and then cudaMemcpy from that device pointer. See the programming guide for the syntax (or search the forums). I’ve never done this before.

As far as I’m concerned, there are no advantages to using a device variable. There are many disadvantages. First, they are global variables and are therefor the root of all evil in OOP programming. Second, accessing them on the host requires more code and is more error prone than if you manage your own device pointers with cudaMalloc. Finally, device arrays are statically sized meaning you have to recompile your program if your problem size changes; Like in old Fortran 77 software, Yuck!

Topic		Replies	Views
global variables CUDA Programming and Performance	5	5376	December 12, 2007
__device__variables and multiple devices CUDA Programming and Performance	4	2651	September 11, 2008
when is memory on host or device? host/device variables CUDA Programming and Performance	7	6453	February 21, 2008
Multi-GPU array CUDA Programming and Performance	2	526	June 4, 2021
__device__ variables and arrays CUDA Programming and Performance	8	15117	August 16, 2014
What address space are device functions/kernels aware of? CUDA Programming and Performance	3	584	June 6, 2017
Allocating memory to __device__ variables CUDA Programming and Performance	1	2322	July 10, 2008
question about memory allocation CUDA Programming and Performance	1	1618	October 16, 2007
How to create a dynamic size array in device? CUDA Programming and Performance	6	3515	August 26, 2008
On which device are __device__ variables allocated? CUDA Programming and Performance	21	6441	March 13, 2009

ALLOCATING MEMORY IN KERNEL v FROM HOST

Related topics