Efficiently dereferencing an array of pointers

Hi all,

I’m rewriting part of a program in CUDA and I’m not sure how best to approach this problem:

Suppose I have an array

pringlescan

which contains pointers to structs

potatochip

. I want to use

cudaMalloc

to allocate an array

pringlescancopy

on the GPU which contains the structs, not pointers, and then use cudaMemcpy to fill the array with the

potatochip

structs give by the pointers in the first array.

I could just loop through the elements and allocate one by one but I fear that this might take longer than necessary, particularly if cudaMemcpy takes any amount of time to start up or something (I may have a LOT of potato chips).

Thanks!

You should definitely do only one large memory allocation on the GPU for the whole array of structs. For highest speed you can also copy the structs into one contiguous array of structs on the CPU (preferably in page-locked memory for faster transfer).