Efficienty Memcpy-ing pooled or no pool?


I have several strings of different lengths which I want to pass to the GPU and process by indiv. threads.

I’ve found could either pool these strings together into one large 2D array then use cudaMemcpyPitch or I could cudaMemcpy each of them to the GPU and then copy the address array to the GPU as well.

I am wondering if there’s drawback to either of these methods.