Arrays of Structure Allocating memory for array of structures.

Hey Folks,

I have run into a peculiar problem, which is as follows:

typedef struct {
int a;
} struct1;

typedef struct {
struct1 * struct_1st;
} struct2;
typedef struct {
struct 2 *struct_2nd;
} struct3;

int main()
{
struct3 devStruct;
cudaMalloc((void**)&dev.struct_2nd, sizeof(struct2));
cudaMalloc((void**)&dev.struct_2nd.struct_1st, sizeof(struct1)); // ERROR IN THIS LINE BECAUSE I THINK I AM TRYING TO ACCESS DEVICE MEMORY IN THE HOST. //
}

Can anyone provide me with a solution to this issue. How can I allocate memory for the top level structure in this structure tree.

Suggestions can do wonders for me.

Thanks in advance,

You have to use cudaMalloc to allocate each structure, including the top level structure, and then use a device kernel to assign the pointers inside each level of nested structure.

In any case, array-of-structure style data storage is generally best avoided in CUDA (and HPC generally).

Can you elaborate a bit. I could not understand how would I use cudaMalloc(), using that makes me use device memory in the host code. If you could give me an example, that woud be great.

I cannot help but use the array of structures, because of the application that I am trying to port on GPU. I’ll have to make changes in the whole of source code not to use it. So, I thought this is the better approach. Anyways, what is more suited for HPC, normal arrays or something else??

The key is you don’t use device memory in the host code, you have to do all the assignment in device code, so that for a structure like this

struct mystruct

{

	float *a;

	int *b;

};

you use cudaMalloc to allocate an instance of struct mystruct, and a and b on the device, and then launch an initialization kernel like this

__global__ void myinit(struct mystruct *ms, float *a, int *b)

{

  ms->a = a;

  ms->b = b;

}

to assign the pointers. You can then use the device structure in your code.

Normal arrays are generally better. What you loose in expressiveness is more than made up for in performance and portability. Failed that, storing indices to arrays rather than pointers is preferable, because indices are portable and not tied to a given memory address space (in the gpu, or in one node of a distributed memory cluster or grid).

Thanks for this example. But there is one issue that I have, I was using structures so that I could pack everything into one structure and pass just one argument to the kernel. If I do what you just mentioned, I might end up using more than 256 bytes for the parameters to the kernel and that is not allowed. I would surely try it though. In my case there are around 25 odd parameters.

Am grateful for this advice.

If this becomes an issue, just create one struct with 25 different pointers inside and then just pass one struct to the kernel instead of an absolutely gigantic list of parameters.

Could you please look at the first post. My problem is that I have to use some sort of an inheritance of structures and doing that normally makes me access device memory in the host code.

You can either use multiple initialization kernels and set up the structure in stages, or you can assign some of the pointer values to variables in constant memory and have the initialization kernel read the values from constant memory rather than pass them via an argument list.