Using cudaMalloc to allocate a quasi-2D array

Hi all,

I’m retooling a program in C with CUDA and I’ve come across something of an oddity (for me). Here’s the original code:

float data=NULL;

data = (float

for (i=a; i<=b; i++) data[i] = (float*)&(moreData[i]);

and then elements in data are accessed as


So my question is, can I create an equivalent structure on GPU memory using

[host code]
float data=NULL;
)&data, nmaxsizeof(float));

[device code, with data and moreData passed to kernel]
for (i=a; i<=b; i++) data[i] = (float*)&(moreData[i]);
and then access via data[i][j]?

I guess this is mostly a newbie question about whether I can typecast &data as float**, and whether layered square brackets always cycle through pointers reliably on devices.


From a first glance I don’t see why not. But I can see the memory access pattern being misaligned and a bit slow. I assume you’ve looked through the documentation for cudamallocpitch?

The way that code is implies that it is a ragged array, but is it ?

If the j lengths for the different [i] are all similar then it could be worthwhile taking them all up to the same value (a multiple of 16) and making it a regular array.