Problem in Passing a structure using CUDA

i have a structure like this:

typedef struct image_s
int width;
int height;
int stride;
float *data;
} image_t;

and i have a function which looks like :

void solver(image_t *du, image_t *dv, const image_t *a11, const image_t *a12)
… }

and i have pointers defined:
*du_ptr = du->data, *dv_ptr = dv->data,
*a11_ptr = a11->data, *a12_ptr = a12->data.

Now i want to increment my pointers, and after incrementing i want to execute this function(solver) in GPU.
but i am not able to pass the structures. Can somebody help me

there are alignment concerns, but

“but i am not able to pass the structures”

meaning what, exactly?
do you get an error, or what is keeping you?

structures/classes with embedded pointers are generally more difficult to handle, and will often times present difficulty for beginners unfamiliar with “deep copy” in CUDA. As little_jimmy has suggested, probably someone can help you if you give a more complete description of what you are doing - perhaps a short, complete code, along with the actual error output.

There are a great many questions like this on various web forums with worked out solutions, such as here:

Depending on the use case, you may also want to investigate the use of index-based linkage instead of pointer-based linkage. Ever since my time programming embedded systems I keep my eyes open for such opportunities. Curiously, these techniques go back to the early days of Fortran, when the language did not support the notion of explicit pointers.

Advantages of index-based linkage are reduced storage (32-bit or even 16-bit indexes instead of 64-bit pointers), ease and efficiency of copying (as little as one bulk copy may suffice), and improved locality improving performance through better use of caches (e.g. lists stored in one contiguous block of memory). One disadvantage is the increased cost of dereferencing, but this is just a few additional instructions at the point where an index gets converted into a pointer. These are often “free” since the code is bottlenecked by data transport. The performance characteristics of re-sizing are also different: the average cost is usually very small to zero, the one-time cost can be substantial.

hi guys Thanks for your replies.
today my problem got solved. instead of passing the full structure i passed only the pointers.
where *du_ptr = du->data, *dv_ptr = dv->data, *a11_ptr = a11->data

float *D_du, *D_dv, *D_a11, *D_a12, *D_a22, *D_b1, *D_b2, *D_dpsis_horiz, *D_dpsis_vert;

  cudaMemcpy(D_du, du_ptr, iterations*sizeof(float), cudaMemcpyHostToDevice);
  cudaMemcpy(D_dv, dv_ptr, iterations*sizeof(float), cudaMemcpyHostToDevice);
  cudaMemcpy(D_a11, a11_ptr, iterations*sizeof(float), cudaMemcpyHostToDevice);