I get a segmentation fault at that line, but only when trying to run on the device. I can’t figure out what it is I’m doing wrong but I’d really like to use nested structures with dynamically allocated memory (on the host first of course, not on GPU). I have a lot of b’s, but the size of c in each one might be different each time I run the program.
The array of BIAS_ENTRY structs - “bias_element” - is the fixed size array, so the whole of the struct DHGN is likely to occupy the continuous block of the memory. If You are looking for the simplest solution, it’s enough to allocate the appropriate amount of device memory with
and then just copy the DHGN nodes array from the host memory to the device memory pointed by dev_dhgn with the cudaMemcpy function. But if You really care about the speed and optimality, I advise You to consider using SOA (Structure of Arrays) instead of AOS (Array of Structures) by organising the DHGN nodes array in the CUDA memory in the more clever way and leverage memory optimization techniques such as: coalescing or utilizing the global memory (cc 2.0+) or “texture memory” cache.
How about the kernel function? Coz normally kernel function must declare in void type but for my case i need to return the data value. And after allocate the DHGN into gpu memory, is it the element of the DHGN struct can declare directly in the kernel function or i need allocate them also?
As for the first part of Your question, the simplest way to “return” the single variable from the kernel function is to declare it as the global device variable. Then, it is, of course, visible from the point of view of the kernel function. The another, more ellegant way is to allocate the device memory for the return value and pass the address as the argument to the kernel function. After execution of the kernel, You just have to copy it from the device to the host memory.
As for the second one, I’m not sure I get what You mean, but as far as the bias_element array is concerned, You don’t have to allocate the device memory for it, because it’s fixed-size and, as a result, the whole of the struct DHGN occupies the continuous block of the memory.