how and best way to allocate C struct to gpu? novice question

Hi all, I was wondering if someone could provide some guidance on how to allocate a struct similar to the following in cuda.

typedef struct {
float a,b,d,tol1,tol2,v,w,x,xm;
int j,its,n;
float4 *g,*h,*xi,*p;
} example;

This is sort of my line of thought on this right now.

  1. allocate pointer to struct on gpu:
    example * e;
    cudaMalloc((void**)&e, sizeof(example));

  2. Now allocate the float4s
    cudaMalloc((void**)&e->g, n*sizeof(float4)); //n is already defined to be length of array
    repeat

  3. Now Im thinking that I wont have to allocate the ints and floats seperately, so I just need to copy over their values.
    I was thinking the best way to accomplish this would be to pass them as arguments to the kernel and then let the kernel
    just assign them instead of doing a memcpy.
    For example:
    global void k(example * e, int num)
    {
    e->its=num;
    }

  4. Lastly a few of the float4 arrays have their data already on the gpu so I was thinking of just assiging them as such:
    global void k(example *e, float4 * arr)
    {
    e->p=arr;
    }

So would this stuff even work (I know I should try it but I am pretty sure it wont work)? Is it an efficient way to accomplish this?
I know I could just do without the struct but there are even more variable then I have listed inside it, so I thought using this would really
clean things up.

Thanks.

Step 2 can’t work - cudaMalloc requires a host pointer. For a flat structure. the simplest way to do it is to allocate the individual device pointers onto a copy of the structure in host memory, then just copy the final structure into device memory. If you need to build trees, it requires recursion and gets ugly very quickly.

Step 2 can’t work - cudaMalloc requires a host pointer. For a flat structure. the simplest way to do it is to allocate the individual device pointers onto a copy of the structure in host memory, then just copy the final structure into device memory. If you need to build trees, it requires recursion and gets ugly very quickly.

Thanks for the help. I have a follow up question. I allocated the memory for the structure on the gpu. Now I have a kernal that does something like this (well a little more complicated):

global void func(float4 x, float *f)
{
*f=25;
}

Now I have a hard time calling the function. I have tried things like:

func<<<1,1>>>(v->p,&(v->fp));

and even changing fp to *fp in the example struct and I get a segmentation fault every time. Do I have to allocate the memory for all the ints and floats in the struct before assigning them if I am using their pointers???

I am confused because it works if I just do, in the kernel:
v->n=n; //where is n is passed into the kernel

Any help would be greatly appreciated.

Thanks,
Jack

Thanks for the help. I have a follow up question. I allocated the memory for the structure on the gpu. Now I have a kernal that does something like this (well a little more complicated):

global void func(float4 x, float *f)
{
*f=25;
}

Now I have a hard time calling the function. I have tried things like:

func<<<1,1>>>(v->p,&(v->fp));

and even changing fp to *fp in the example struct and I get a segmentation fault every time. Do I have to allocate the memory for all the ints and floats in the struct before assigning them if I am using their pointers???

I am confused because it works if I just do, in the kernel:
v->n=n; //where is n is passed into the kernel

Any help would be greatly appreciated.

Thanks,
Jack

It is the same problem as (2) in your first question. You are trying to use a device pointer in host code, which won’t work. You should have a copy of the structure in host memory, so use that host memory copy for the kernel launches, rather than trying to dereference the device copy.

It is the same problem as (2) in your first question. You are trying to use a device pointer in host code, which won’t work. You should have a copy of the structure in host memory, so use that host memory copy for the kernel launches, rather than trying to dereference the device copy.