dynamic parallelism and allocating global memory array of type double


What would be the correct way to allocate an array of type double in global memory when using dynamic parallelism

When I store the the array pointer in shared memory, cudaMalloc fails
When I change the data type to (unsigned) int, it succeeds; I suppose it is because of the way anything bigger than 4 bytes is stored in shared memory?

Must I store the pointer in local or global memory…? Hardly desired, this is

One way is to allocate it with malloc (device) and pass the pointer as an argument to the child grid. Another way is to allocate it from the host before the first grid launch.

It’s worth noting that shared memory is not preserved between a parent and child thread in CNP (nor is local memory), so it is invalid to pass anything through shared memory or local memory between threads in different grids (or thread blocks for that matter).

Can you share the code for storing the pointer in shared memory (or describe it in more detail)?

Yes, I have now reverted to first creating the arrays from the host side, and passing pointers to them to the child kernels, via the parent kernel

In short, this is what I initially did:

unsigned int i = threadIdx.x;

shared double* ptr;

if (i == 0)
cudaMalloc(&ptr, sizeof(double) * length);

[launch child kernel, and pass pointer to array stored in ptr]

[continue parent kernel]

I can create an unsigned int array this way; and that program debugs and runs just fine

I wanted to create multiple double arrays this way, but the program simply terminates after the 2nd malloc call

There shouldn’t be anything special about your use of shared memory in this case. The pointer is only stored in shared memory between the cudaMalloc call and the child_krnl. From your previous post I thought that you meant that the child kernel was loading it out of shared memory rather than receiving it as a parameter.

You may just want to use ‘malloc’ rather than ‘cudaMalloc’. cudaMalloc on the device is just a wrapper around malloc, and since malloc returns the result directly, you can store it in a register without needing to use shared or local memory.

None of this should cause an error though, and I would suspect that there is something else going on here that isn’t related to the use of shared memory.