CUDA Ray Tracing - error when mesh's faces are a lot

rinsavs · November 13, 2016, 2:46am

Hello, everyone. I’m currently building a BVH ray tracing with CUDA. When i tried to do the cudaMemcpy from device to host, it gives me ‘unspecified launch error’. There is no recursion in my code. However, the code worked well when I ran it with 12 faces mesh. But, it gave the above error when I ran it with 33 faces mesh. Can someone please help me?

Thanks a lot.

Robert_Crovella · November 13, 2016, 4:48pm

It’s likely that your kernel (immediately preceding the cudaMemcpy that is reporting the error) is making an invalid operation of some sort, when you increase to 33 faces mesh.

Use proper CUDA error checking throughout your code
Run your code with cuda-memcheck. You should get an indication of the actual problem in the kernel
Follow the procedure here:

[url]cuda - Unspecified launch failure on Memcpy - Stack Overflow

and recompile (with -lineinfo) and run your code with cuda-memcheck again to have cuda-memcheck report the actual line of kernel code that is causing the fault
4. Use printf or other debugging techniques to further expose the nature of the problem in that line of code.

rinsavs · November 14, 2016, 7:33am

Hello txbob
I ran the cuda-memcheck and found my mistakes.
Thanks!

cbuchner1 · November 14, 2016, 9:39am

is this a closed project, or can you make it open later? I’d be very interested in seeing a BVH implementation in CUDA.

rinsavs · December 7, 2016, 8:27am

Hello, cbuchner1
I guess it’s closed since it is my bachelor thesis… But I’ll see my campus’ regulation about this later

MutantJohn · December 7, 2016, 4:38pm

I’d be curious as well. I’m looking at the wiki now. Maybe they use one thread to solve a pair-wise intersection test?

The issue is, imo, that for this to be effective on the GPU, the tree needs to be relatively shallow and wide. I’m imagining one thread per node at a certain depth in the tree. I’d be curious to see how this fares against a well-implemented CPU version.

A CPU version might be better in the sense that various geometric objects will have their own intersection routines which breaks the SIMT architecture of CUDA. Unless all your objects are the same.

Eh, all I read was the wiki for 5 minutes. I’d be curious to see the paper.