I get a segmentation fault at that line, but only when trying to run on the device. I can’t figure out what it is I’m doing wrong but I’d really like to use nested structures with dynamically allocated memory (on the host first of course, not on GPU). I have a lot of b’s, but the size of c in each one might be different each time I run the program.
Thanks! I was headed down that path after posting but after about 4 hours I still didn’t get it to work until I carefully followed your code. Works great not, a lot of running around though haha…
./atomsinglenumber.cu(112): Warning: Cannot tell what pointer points to, assuming global memory space
Question: Is it right to assume that using this method those arrays X_D_host->… will be transferred into global memory space? And I can then within the kernel transfer them to shared memory space?
Also, Do I need to make any extra definitions to avoid unexpected behaviour because the compiler cannot tell what the pointer is pointing to?
The array of BIAS_ENTRY structs - “bias_element” - is the fixed size array, so the whole of the struct DHGN is likely to occupy the continuous block of the memory. If You are looking for the simplest solution, it’s enough to allocate the appropriate amount of device memory with
and then just copy the DHGN nodes array from the host memory to the device memory pointed by dev_dhgn with the cudaMemcpy function. But if You really care about the speed and optimality, I advise You to consider using SOA (Structure of Arrays) instead of AOS (Array of Structures) by organising the DHGN nodes array in the CUDA memory in the more clever way and leverage memory optimization techniques such as: coalescing or utilizing the global memory (cc 2.0+) or “texture memory” cache.
How about the kernel function? Coz normally kernel function must declare in void type but for my case i need to return the data value. And after allocate the DHGN into gpu memory, is it the element of the DHGN struct can declare directly in the kernel function or i need allocate them also?
As for the first part of Your question, the simplest way to “return” the single variable from the kernel function is to declare it as the global device variable. Then, it is, of course, visible from the point of view of the kernel function. The another, more ellegant way is to allocate the device memory for the return value and pass the address as the argument to the kernel function. After execution of the kernel, You just have to copy it from the device to the host memory.
As for the second one, I’m not sure I get what You mean, but as far as the bias_element array is concerned, You don’t have to allocate the device memory for it, because it’s fixed-size and, as a result, the whole of the struct DHGN occupies the continuous block of the memory.