copy tree to device, driver problem wrong desktop pixels after cudaErrorUnk.

Hello,

I’m currently porting a ray tracer with bvh traversal (stack based) to CUDA. The tree traversal works in emu mode, but not on the device. It gives me an cudaErrorUnknown after traversal when copying the result image back to the host. Using brute force ray tracing the memcpy works fine, so that doesn’t seem to be the problem. The tree traversal works in other (non-CUDA) applications.

Furthermore, I get wrong pixels on my desktop after the error, you can see it in the attached screenshot.

I think I don’t copy the tree correctly to the device, so here’s the code:

struct  BVHNode{

int splitAxis;

float3 bbmin, bbmax;

BVHNode *child0; 

BVHNode *child1; 

Triangle *tris; 

int numTris; 

};

//node = node on host

//node_d = node on device

void CopyBVHToDevice(BVHNode*& node, BVHNode*& node_d)

{

        //allocate node on device and copy host node

	cudaMalloc( (void**)&node_d, sizeof(BVHNode));

	cudaMemcpy(node_d, node, sizeof(BVHNode), cudaMemcpyHostToDevice);

	

       //allocate memory for triangle array and copy from host node triangles

	cudaMalloc( (void**)&node_d->tris, sizeof(Triangle)*node->numTris);

	cudaMemcpy(node_d->tris, node->tris, sizeof(Triangle)*node->numTris, cudaMemcpyHostToDevice);

	

        //continue with child nodes

	if(node->child0 !=0)

	{	

  CopyBVHToDevice(node->child0, node_d->child0);

	}

	

	if(node->child1 !=0)

	{

  CopyBVHToDevice(node->child1, node_d->child1);

	}

	

}

When printing out the node-addresses they all look valid, but the tree traversal code seems to stop working after the root node.

I hope I just made a stupid error and somebody can help me.

I’m working on Vista 64-Bit, NV GTX 260 (driver 177.84, latest CUDA sdk/toolkit), MS VS 2005 Pro, Q6600 CPU, 4 GB RAM, MSI P6N mainboard

[attachment=7457:attachment]
pixeltrash.JPG

So I guess the above code is correct.

Here’s the minimal kernel code to get the error:

__global__ void TraverseBVH(Ray *rays, BVHNode *bvhroot, float4 *img, float width)

{

        int x = blockIdx.x * blockDim.x + threadIdx.x;

        int y = blockIdx.y * blockDim.y + threadIdx.y;

        int index = x+y*width;

	

        img[index].x = 0.0f;

        img[index].y = 1.0f;

        img[index].z = 0.0f;

        img[index].w = 0.0f;

	

        if(x<width && y <width)

        {

	

        BVHNode *stack[32];

        int stackSize=-1;

	

        BVHNode *currNode = bvhroot;

        stackSize++;

        stack[stackSize] = currNode->child0;

	

        if(currNode->child0 != 0) currNode->child0->splitAxis=0; //works

        if(stack[stackSize] != 0) stack[stackSize]->splitAxis=0; //doesn't work with this line, cudaUnknownError

        }

}

Why isn’t the stack working? Am i missing something?

Nobody from Nvidia here caring about the corrupted pixels?

Those corrupted pixels might indicate that you have been writing to memory that was not allocated in CUDA. That can also explain the fact your kernel return with an error

Thanks for your reply, but I haven’t found anything which wasn’t allocated, assuming that the tree copy i posted is correct. Without the tree everything works fine, so there has to be something wrong with my tree…

your tree is not copied correctly, when copying it has pointers to host memory in the structure. See comments in the code below.

struct  BVHNode{

int splitAxis;

float3 bbmin, bbmax;

BVHNode *child0;

BVHNode *child1;

Triangle *tris;

int numTris;

};

//node = node on host

//node_d = node on device

void CopyBVHToDevice(BVHNode*& node, BVHNode*& node_d)

{

       //allocate node on device and copy host node

cudaMalloc( (void**)&node_d, sizeof(BVHNode));

cudaMemcpy(node_d, node, sizeof(BVHNode), cudaMemcpyHostToDevice);

// You have now copied the node as it is on the host to the device, that means the pointers child0, child1, tris are pointers to host memory. Nowhere below are you updating these pointers as far as I can see.

     //allocate memory for triangle array and copy from host node triangles

cudaMalloc( (void**)&node_d->tris, sizeof(Triangle)*node->numTris);

cudaMemcpy(node_d->tris, node->tris, sizeof(Triangle)*node->numTris, cudaMemcpyHostToDevice);

// Check if pointers are correct.

printf("Adress of node_d->tris looks to be : %d\n", node_d->tris);

BVHNode *temp;

temp =(BVHNode *) malloc(sizeof(BVHNode));

cudaMemcpy(temp, node_d, sizeof(BVHNode), cudaMemcpyDeviceToHost);

printf("Adress of node_d->tris on device is actually : %d\n", temp->tris);

      //continue with child nodes

if(node->child0 !=0)

{

 CopyBVHToDevice(node->child0, node_d->child0);

}

if(node->child1 !=0)

{

 CopyBVHToDevice(node->child1, node_d->child1);

}

}

I guess you need to first allocate the memory for the triangles and the children, update the pointers accordingly and then transfer the current Node to the GPU.

Personally I use a kdtree and have put it in an array, and instead of pointers I use an offset (left = next node in array, right is ‘offset’ elements away). I also have a separate triangle array, and only have the offset into that array and the amount of triangles in the leave in my kdtree structure.