Allocating and copying to a big __device__ struc

Hello,

I am trying to copy a struct to global memory but I’m not sure if I’m doing it correctly. When I used only a pointer to send the struct to the GPU, everything worked fine. Any comments would be welcome. I am posting the code and the size of the struct calculated at runtime.

Thank you in advance,
Rafael

    __device__ OctreeNodeGPUStatic dg_nodes;
    OctreeNodeGPUStatic* d_nodes;
    OctreeNodeGPUStatic* h_nodes;

    h_nodes = (OctreeNodeGPUStatic*)malloc(sizeof(OctreeNodeGPUStatic)); 
    h_nodes = DeserializeOctreeNodeCPUStaticFromFile(fileName);

    CUDA_CHECK(cudaMalloc((void **)&d_nodes, sizeof(OctreeNodeGPUStatic)));
    CUDA_CHECK(cudaMemcpy(d_nodes, &h_nodes, sizeof(OctreeNodeGPUStatic), cudaMemcpyHostToDevice));
    CUDA_CHECK(cudaMemcpyToSymbol(dg_nodes, d_nodes, sizeof(OctreeNodeGPUStatic)));
    CUDA_CHECK(cudaDeviceSynchronize());

unnamed

What do you want to achieve now compared to, when it worked fine?

&h_nodes is a pointer to a pointer, cudaMemcpy expects a pointer only.

Typically cudaMemcpyToSymbol is used for copying constant memory, but can also be used for global memory. What do you want to achieve with it?

As long as you do not use the asynchronous copy function variants, cudaDeviceSynchronize should not be necessary here.

Greetings,

Thank you for the answer.

My goal is to achieve faster traversal time for the octree.

The issue arises when I include this line in the code:

This addition causes the code to crash with the following error:

CMakeFiles/prismatic.dir/prismatic.cu.o: in function "__nv_cudaEntityRegisterCallback(void**)":
tmpxft_0029cce1_00000000-7_prismatic.cudafe1.cpp:(.text+0x649): relocation truncated to fit: R_X86_64_PC32 against ".bss"

If I define the tree as:

OctreeNodeGPUStatic* d_nodes;
OctreeNodeGPUStatic* h_nodes;

and allocate and copy to the GPU as:

CUDA_CHECK(cudaMalloc((void **)&d_nodes, sizeof(OctreeNodeGPUStatic)));
CUDA_CHECK(cudaMemcpy(d_nodes, h_nodes, sizeof(OctreeNodeGPUStatic), cudaMemcpyHostToDevice));

The code compiles correctly.

Just this line (within a .cu file) without any other use of dg_nodes produces the error?

Is your __device__ variable at global scope, e.g. not inside a function?

Can you try with a different variable type, e.g. float?
How large is your struct?

Could you try to compile all files with --relocatable-device-code?

What you posted looks like a snippet of your code, not the actual code. Without a complete self-contained reproducer code that others can compile, the error message shown suggests your problem is in host code, where there is a huge statically allocated data object (e.g. double my_huge_array [100000000];). So large that 32-bit offsets into the BSS segment are not sufficient to address all of it.

If this diagnosis matches what is happening in your code: Don’t do that. Allocate large data objects on the heap, via malloc().

Allocating your large data object dynamically with malloc() should not have a negative impact on the traversal time of the octree (unless you are doing something special that you have not told us about yet).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.