Using array of structures

Hi everyone,

I’m having problems using an array of structures in my code. I have searched the forum and the net but I didn’t find a satisfying solution for this problem.

This is a snippet of my code that deals with the structure

[codebox]

typedef struct{

unsigned int *OF;

int part_no;

}overflow;

overflow of[100], *d_of;

int i, no_part=100,size_part=25000;

cudaMalloc((void**)&d_of,no_part*sizeof(overflow));

for(i=0;i<100;i++)

{

of[i].OF=(unsigned int*)calloc(size_part, sizeof(unsigned int));	



    for(int u=0;u<size_part;u++)

	of[i].OF[u]=0;

cudaMalloc((void**)&d_of[i].OF,size_part*sizeof(unsigned int));	

}

cudaMemcpy(d_of,of,no_part*sizeof(overflow),cudaMemcpyHostTo

Device);

[/codebox]

So as you see I have a dynamic array of unsigned int inside my structure which might complicated things.

Every time I use this structure inside the kernel I get this warning

Warning: Cannot tell what pointer points to, assuming global memory space

And when I modified the values inside the array in the device and returned it to host it always returns junk values.

Heres my questions:

1- How to do this properly?

2- I read a post that said that array of structures shouldn’t be used and its better to use regular arrays instead. So should using array of structures be totally avoided? or is it just better performance wise?

I’m using the Tesla T10 Processor

I would appreciate any help in clarifying this matter to me

Your problem is that you are trying to use host pointers on the device. cudaMemcpy doesn’t magically do any “deep” copying, so your copy of the structure array in the device winds up holding host pointers and without a valid memory alllocation of device memory. Which will always fail.

To make this work, you are going to have to do a cudaHostMalloc() for each OF entry, not malloc(). You won’t be able to zero the memory directly the way your code does now (you can’t modify device memory directly in host user code like this), but cudaMemset() will work for the simple zeroing you are doing here. If you need to transfer this structure between device and host (or vice versa), you will have to to a cudaMemcpy() for each OF value, as well as the host structure itself. That means your example code has to execute 101 separate cudaMemcpy() calls each time you need to move this structure between host and device.

Perhaps now you might begin see why using an array of structures (particularly one containing pointers), isn’t very practical proposition in the CUDA paradigm…