Problem With GPU Memory

Hello everyone, I’m having a little problem with Copying/Allocating memory to the GPU.
Here’s my problem:

// I’m creating a Bounding Volume Hierarchy, basicaly a Binary Tree on a Raytracing program using interoprability with DirectX9
// (using a texture to display the raytracer’s rendering)

The Program:

device CBVH *cuda_BVH; // Tree on GPU
BVH bvh; // Tree on CPU

// Build the Tree on CPU
BuildBHV(spheres, nbSpheres, bvhDepth, bvh);

// Allocate the Structure on GPU
cutilSafeCall(cudaMalloc((void**) &cuda_BVH, sizeof(CBVH))); -----------------------------------------------------------> WORKS

// Copy the depth attribute of the Tree to GPU
cutilSafeCall(cudaMemcpy(&cuda_BVH->depth, &bvh.depth, sizeof(unsigned int), cudaMemcpyHostToDevice) ); —> WORKS

// Allocate the root node on GPU
cutilSafeCall(cudaMalloc((void**) &cuda_BVH->root, sizeof(CBVHNode))); ----------------------------------------------> DOESN’T WORK

Whenever I try to add the last line, VisualStudio pops up on execution, a window when entering the Message Loop saying an error occured and pointing the
debug pointer to the int __cdecl _write_nolock (int fh, const void *buf,unsigned cnt) function of “write.c - write to a file handle”, at line 335:

                /* write the lf buf and update total */
                if ( WriteFile( (HANDLE)_osfhnd(fh),
                            lfbuf,
                            (int)(q - lfbuf),
                            (LPDWORD)&written,
                            NULL) )

-----> {
charcount += written;
if (written < q - lfbuf)
break;
Important point : I don’t execute any kernel. This is just the memory initialisation.
THX for your help.

cuda_BVH points to device memory.
cuda_BVH->root is a variable that resides on the device.
&cuda_BVH->root is an address in device memory. It cannot be assigned directly using cudaMalloc. Using cudaMalloc, the pointer to the allocated block must be saved in host memory only. (Though it can later be transferred to the device.)

It’s easy to make this kind of mistake due to the way cudaMalloc returns its result in a void ** parameter.

Linked structures in general are difficult to manipulate and transfer. I’d recommend allocating an array of node structures and referring to them by index instead of using individually-allocated pieces.

Thanks for the explanation. Very Helpfull.

I also thought about using an array instead seems much doable, I’ll go try that.

Don’t use “cutil”. It is not supported by NVIDIA.

It will change from one CUDA version to another and will break your code.

One fine day, NVIDIA may even stop shipping cutil stuff with CUDA. Nothing is guaranteed for “cutil”.

Feel nervous.

I implemented a BVH builder recently.
My memory is managed as follows:

  • one array for all primitives
  • one array for tree structure (generated by the algorithm) with indexes ‘from’ and ‘to’ describing which primitives it contains
  • one array for ‘active front’ - list of nodes yet to be processed.

I launch a single kernel for every layer of the tree to be created, block per each node.
Hardest are the first few steps, becase very few blocks are spawned and those are very big, but later one quicly use full power of your GPU. Final steps are also demanding - lots of tiny blocks to process.
I believe it is best strategy for this task.

Currently I have about 400ms construction time on Conference (283k triangles) on my GTX 260 but I am working to make it better…