Suppose a struct X with some primitives and an array of Y structs:
typedef struct
{
int a;
Y** y;
} X;
An instance X1 of X is initialized at the host, and then copied to an instance X2 of X, on the device memory, through cudaMemcpy.
This works fine for all the primitives in X (such as int a), but cudaMemcpy seems to flatten any double pointer into a single pointer, thus causing out of bounds exceptions wherever there’s an access to the struct arrays in X (such as y).
In this case am I supposed to use another memcpy function, such as cudaMemcpy2D or cudaMemcpyArrayToArray?
I don’t see how that could solve the problem. The data type isn’t the issue here - the issue is how to transfer a complex data structure (non-contiguous) from host memory to device memory.
Besides what I’m already doing (a regular cudaMemcpy from host_X to dev_X), I’ve also tried a for loop cudaMemCpy’ing host_Y’s to dev_Y’s, which results in a crash.
Well, can you actually manipulate your structure like a 1D array? If it works as a 1D array, then why even change it? Just change the way you iterate over it.