Arrays of Structure Allocating memory for array of structures.

mayank · September 22, 2009, 9:43pm

Hey Folks,

I have run into a peculiar problem, which is as follows:

typedef struct {
int a;
} struct1;

typedef struct {
struct1 * struct_1st;
} struct2;
typedef struct {
struct 2 *struct_2nd;
} struct3;

int main()
{
struct3 devStruct;
cudaMalloc((void**)&dev.struct_2nd, sizeof(struct2));
cudaMalloc((void**)&dev.struct_2nd.struct_1st, sizeof(struct1)); // ERROR IN THIS LINE BECAUSE I THINK I AM TRYING TO ACCESS DEVICE MEMORY IN THE HOST. //
}

Can anyone provide me with a solution to this issue. How can I allocate memory for the top level structure in this structure tree.

Suggestions can do wonders for me.

Thanks in advance,

avidday · September 23, 2009, 4:35am

You have to use cudaMalloc to allocate each structure, including the top level structure, and then use a device kernel to assign the pointers inside each level of nested structure.

In any case, array-of-structure style data storage is generally best avoided in CUDA (and HPC generally).

mayank · September 23, 2009, 6:14am

Can you elaborate a bit. I could not understand how would I use cudaMalloc(), using that makes me use device memory in the host code. If you could give me an example, that woud be great.

I cannot help but use the array of structures, because of the application that I am trying to port on GPU. I’ll have to make changes in the whole of source code not to use it. So, I thought this is the better approach. Anyways, what is more suited for HPC, normal arrays or something else??

avidday · September 23, 2009, 6:59am

The key is you don’t use device memory in the host code, you have to do all the assignment in device code, so that for a structure like this

struct mystruct

{

	float *a;

	int *b;

};

you use cudaMalloc to allocate an instance of struct mystruct, and a and b on the device, and then launch an initialization kernel like this

__global__ void myinit(struct mystruct *ms, float *a, int *b)

{

  ms->a = a;

  ms->b = b;

}

to assign the pointers. You can then use the device structure in your code.

Normal arrays are generally better. What you loose in expressiveness is more than made up for in performance and portability. Failed that, storing indices to arrays rather than pointers is preferable, because indices are portable and not tied to a given memory address space (in the gpu, or in one node of a distributed memory cluster or grid).

mayank · September 23, 2009, 3:37pm

The key is you don’t use device memory in the host code, you have to do all the assignment in device code, so that for a structure like this
struct mystruct

{

	float *a;

	int *b;

};
you use cudaMalloc to allocate an instance of struct mystruct, and a and b on the device, and then launch an initialization kernel like this
__global__ void myinit(struct mystruct *ms, float *a, int *b)

{

  ms->a = a;

  ms->b = b;

}
to assign the pointers. You can then use the device structure in your code.

Thanks for this example. But there is one issue that I have, I was using structures so that I could pack everything into one structure and pass just one argument to the kernel. If I do what you just mentioned, I might end up using more than 256 bytes for the parameters to the kernel and that is not allowed. I would surely try it though. In my case there are around 25 odd parameters.

Am grateful for this advice.

eelsen · September 23, 2009, 9:53pm

If this becomes an issue, just create one struct with 25 different pointers inside and then just pass one struct to the kernel instead of an absolutely gigantic list of parameters.

mayank · September 24, 2009, 12:12am

Could you please look at the first post. My problem is that I have to use some sort of an inheritance of structures and doing that normally makes me access device memory in the host code.

avidday · September 24, 2009, 5:26am

You can either use multiple initialization kernels and set up the structure in stages, or you can assign some of the pointer values to variables in constant memory and have the initialization kernel read the values from constant memory rather than pass them via an argument list.

Topic		Replies	Views
Dynamic array inside struct CUDA Programming and Performance	10	23293	July 19, 2009
Parameters passed to a CUDA kernel exceed 256 bytes. CUDA Programming and Performance	13	6991	September 21, 2009
dynamically allocate array of structs CUDA Programming and Performance	6	13109	September 29, 2008
Dealing with Structures CUDA Programming and Performance	1	923	November 11, 2010
Using array of structures CUDA Programming and Performance	1	6264	February 16, 2010
Multi-GPU array CUDA Programming and Performance	2	585	June 4, 2021
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	19069	December 18, 2009
cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit CUDA Programming and Performance	7	7478	June 28, 2011
How can I allocate 2-dimensional array on the device memory? CUDA Programming and Performance	5	15731	August 6, 2009
how and best way to allocate C struct to gpu? novice question CUDA Programming and Performance	6	8076	September 3, 2010

Arrays of Structure Allocating memory for array of structures.

Related topics