dynamically allocate array of structs

Given something along the lines of:

typedef struct
int ival;
float fval;
char *str;

Thing *arrayOfThings;

How would I correctly allocate memory to ‘arrayOfThings’ during runtime after two arguments are specified:
int arrLen - size of ‘arrayOfThings’
int maxStrLen - maximum length of variable ‘str’ of structure Thing
I assume I would use cudaMalloc for ‘arrayOfThings’:

cudaMalloc( (void **)&arrayOfThings, arrLen*sizeof(Thing) );

But then how would I get a pointer to ‘str’ variable of every element of array ‘arrayOfThings’ so that I could malloc ‘str’ to be of length ‘maxStrLen’? Should I use cudaGetSymbolAddress() ?

Also, as an expansion to this question, how would I allocate ‘arrayOfThings’ if it was a two-dimensional array: “Thing **arrayOfThings”?


Allocating pointer-based data structures on the GPU is an exercise in not-very-fun coding. The outline is something like:

  • allocate an array of Things on the host for marshalling to the GPU
  • fill in the other two values based on whatever structures you have on the CPU
  • allocate each string on the device
  • set the poniter str in each Thing to the respective string you allocated on the device
  • allocate an array of Things on the device
  • copy

If you have a maximum length of your string (and it isn’t horribly inefficient–256 bytes when 99% of strings are 8 bytes or shorter, for example), just using a fixed length array is a much easier alternative. It’s one cudaMalloc instead of elements + 1.

Doing a two-dimensional array is pretty much the same thing–lots of host-side marshalling. A one-dimensional array that you index into via your grid and block dimensions is much nicer (both for perf and for readability).

However, there’s one thing I’m not sure of, now that I think about it. What would happen if you stored the array of pointers to arrays of Things in constant memory but stored the Things themselves in global memory? Has anyone tried something like this? I’m curious as to what the perf would be like.

Thank you for the reply. That makes sense. I’ll give it a shot.

Also, I’ve wondered this before, should I use cudaMallocArray() ? And if not when should that function be used?

cudaMallocArray is used for allocating a cudaArray type, not an array of Type.

I understand that, what I meant was what is an advantage of using cudaArrays and why would someone use C style arrays (float arr) rather than cudaArrays?

I believe the cudaArray type does something with reading from texture memory, which may be faster than normal memory (but it is also read-only).

cudaArrays are used with texture memory, yes.