Dynamic array inside struct

Hi all!

I’m trying to allocate a struct with a dynamic array to use as a SOA.

typedef struct s {

	 int *array;

	 int size;

} soa;

So, I try to allocate memory as usual:

soa *test;

cudaMalloc((void**)&test, sizeof(test));

which works fine as expected.

The problem is when I try to allocate memory for the array inside it.

How should I do this?

I doubt that works “fine as expected”, you are using the incorrect size in the cudaMalloc call. As for allocating the array inside the structure, I think you will have to allocate a separate integer array, and then a call device side function to assign its address to the pointer inside the structure.

You probably don’t need to malloc the soa per se, but you do need to malloc the individual arrays inside. I would take this approach:

struct soa


  // each member of the struct of arrays is an array:

  int *member0;

  int *member1;

  int *member2;


  int size;


__global__ foo(soa arrays, int *result)


  unsigned int i = gridDim.x * blockDim.x + threadIdx.x;

// these loads will coalesce

  result[i] = soa.member0[i] * soa.member1[i] + soa.member2[i];


// this code is paraphrased, but you get the idea

void main(void)


  soa struct_of_arrays;

struct_of_arrays.size = 100000;

// malloc each array inside the soa individually

  cudaMalloc(struct_of_arrays.member0, struct_of_arrays.size);

  cudaMalloc(struct_of_arrays.member1, struct_of_arrays.size);

  cudaMalloc(struct_of_arrays.member2, struct_of_arrays.size);


// malloc space for the result

  int *result;

  cudaMalloc(result, struct_of_arrays.size);

// launch the kernel, passing struct_of_arrays by value:

  foo<<<gridDim,blockDim>>>(struct_or_arrays, result);


Thanks for your fast replies!

Thanks. That works as I wanted.

But now, taking things further, what I really want is a SOA where the number of arrays is dynamic and the type of each array is unknow/different.

Something like:

typedef struct s2 {

	 void **elements;

	 int type;


typedef struct s1 {

	 soa_array *array;

	 int size;

} soa;

The idea is to create a SOA struct depending on some kind of input.

I tried to “port” your previous solution to this but failed miserably :thumbsdown:

Is this kind of approach even duable with cuda?

If what you want is dynamic typing with C/C++, this would be at worst impossible, or at best very ugly. Certainly no good solution would be possible in device code. You might consider some of the CUDA Python bindings for your problem, as Python is a language with dynamic typing.

If what you want is dynamically sized arrays, then you basically need to make your SOA a dynamically-sized array of pointers.

If you know what types you’ll need at compile time, you could create a tuple of arrays.

In fact, Boost’s zip_iterator nicely encapsulates the SOA idea. Unfortunately, it’s not usable in CUDA yet.

Well, it does work, I’ve written a simple example. My problem is actually having to allocate memory for this.

So, without thinking about this as a dymic type array, how can I have a struct with an array of structs which in turn have an array of ints?

typedef struct s2 {

	 int *elements;

} array;

typedef struct s1 {

	 soa_array *array;

	 int size;

} soa;

The thing is that I need to allocate space for something and space for something inside that…Is it doable?

So, is there no way of doing something like this with CUDA?

You just have to marshall things? I’m not sure why this is hard or any different from structs in the first place? Something like

array a_host, *a_device; //a_host is only used for marshalling the cudaMemcpy, not for actual storage

soa s_host, *s_device; //same for s_host

cudaMalloc((void**)&a_host.elements, sizeof(int) * someNumberOfElements);

cudaMalloc((void**)&a_device, sizeof(array));

cudaMemcpy(a_device, &a_host, sizeof(array), cudaMemcpyHostToDevice);

cudaMalloc((void**)s_device, sizeof(soa));

s_host.array = a_device;

cudaMemcpy(s_device, &s_host, sizeof(soa), cudaMemcpyHostToDevice);

It’s annoying, but that’s the price you pay when you want to use pointer-based structures in remote address spaces.

Thanks. I’ve tried this and I can make it work when this structs represent an output in my kernel! Which is what I want :)

But I can’t make it work when I want to initialize a struct like this in host, pass it to device, and for example copy all the elements from the input to the output (although what I want to do later on is a bit more complicated).

For example:

__global__ void kernel1(soa *input, soa *output) {

   output->array->elements[0] = input->array->elements[0];


I tried to use tmurray’s suggestion and just copy the elements array from host to device, like this:

cudaMalloc((void**)&a_host.elements, sizeof(int) * someNumberOfElements);

cudaMemcpy(&ta_host.attr_data.buffer_int, elements_host, sizeof(int)*N, cudaMemcpyHostToDevice);

But apparently it doesn’t work, although it should, right?

Anyway, if I’m taking the right approach, how can I do this?

Ok, so I managed to make it work! :)

I was doing the copy in a wrong way.

it should be:

cudaMemcpy(ta_host.attr_data.buffer_int, elements_host, sizeof(int)*N, cudaMemcpyHostToDevice);

instead of:

cudaMemcpy(&ta_host.attr_data.buffer_int, elements_host, sizeof(int)*N, cudaMemcpyHostToDevice);

Thanks again to everyone!

[sorry, post shoul be deleted]