cudaMalloc is an API call for host code. It allocs device mem and returns the start address. There is no need for declaring the device mem then.
The pointer to the pointer variable you pass into cudaMalloc therefore should be a pointer variable on the host, not on the device. (it sometimes works in emulation as everything is on the host then)
For accessing the mem, you can pass this pointer to the kernel as parameter.
Thanks for the quick response, but now there is another problem. Suppose the struct is defined as
struct mystruct
{
float* dynarray;
};
and I need to allocate dynamically the array of structs first, and then allocate an array of floats for each member of that array? I still get the unhandled exception…
ANY memory allocated by cudaMalloc is DEVICE memory. You cannot dereference such a pointer anywhere except in a kernel. If you need to allocate a pointer inside an allocated structure, you are going to need to make a “mirror” structure on alocated on the host (with normal malloc/new/whatever), then cudaMalloc all of the pointers inside and cudaMemcpy the mirror structure to device memory.
As MisterAnderson42 emphasized, you cannot dereference a device mem pointer on the host. So you probably need to “invert” you structures, ie. turn an array of structs into a struct of arrays. Then you alloc all the arrays, put their start pointers into the struct and upload the struct.