Hi everyone,
I am pretty new to CUDA. I would like to have some guidance if you don’t mind please. In fact, i would like to allocate memory for a complex object on device memory and then copy this object from host to device. My struct looks something like :
struct dummy{
int size;
unsigned int *data;
dummy(int s)
: size(s), data(new unsigned int[s])
{}
void fill(){
//fill the data array with random values
}
}
On the CPU I have created and filled my dummy object :
dummy obj_cpu(80);
obj_cpu.fill();
Now I would like to allocate sufficient memory on the device for obj_cpu and copy obj_cpu to it. Thus I thought of doing this :
// Declare a dummy pointer on the CPU
dummy obj_gpu;
// Allocate memory for obj_gpu on device memory
cudaMalloc(&obj_gpu, sizeof(obj_cpu));
// Copy data from host(CPU) to device(GPU) as a bulk
cudaMemcpy(&obj_gpu, &obj_cpu, sizeof(obj_gpu), cudaMemcpyHostToDevice);
Is this the right way to allocate memory/copy complex object on the device memory or should I do it this way instead :
// Declare a dummy pointer on the CPU
dummy obj_gpu;
// Allocate memory for obj_gpu.data on device memory
cudaMalloc(&(obj_gpu.data), obj_cpu.s*sizeof(unsigned int));
// Copy data from host(CPU) to device(GPU) element by element
obj_gpu.s=obj_cpu.s;
for(int j=0; j<obj_cpu.s;j++)
cudaMemcpy(&(obj_gpu.data[j]), &(obj_cpu.data[j]), sizeof(unsigned int), cudaMemcpyHostToDevice);
I would love to have please some feed-backs on which strategy should I use and also why. Thank you