Ray Tracing

JaredHoberock · March 30, 2007, 8:35pm

Is anyone else using CUDA for ray tracing? I was hoping we could share performance statistics.

This paper demonstrates over 90M rays/sec on older class hardware. I’m only seeing around 9M for the Cornell Box with the same acceleration data structure. Is anyone else having better luck?

tachyon_john · March 31, 2007, 3:18am

Jared,

I haven’t had a chance to try any ray tracing yet, but it wouldn’t surprise me at all if you’re running into one or more gotchas with your CUDA kernel(s). How many registers do your kernels use, what occupancy are you achieving? You may not be getting as much global/texture memory latency hiding as you may need. Are you managing to use the shared memory area to reduce or eliminate memory accesses?

Cheers,

John

pitchaya · March 31, 2007, 5:51pm

Hi,
I have been working with both ATI and CUDA. My code on ATI is faster than my unopimized code on CUDA. The reason is my program had a lot of memory dependence,
Here is what I did to improve:
1. Use texture cache. If your data structures is read only, just bind them to a texture cache. ( for my case , a linear texture cache) It just works, fast and easy.
or
2. try to use more share memory. It is easy to do with the application I’m working, but I don’t think that you can easily use share memory in ray tracer.

If I have time this summer, I would like to play around with ray tracer with CUDA too.

Yam

JaredHoberock · April 1, 2007, 1:51am

Is there a way to gauge register pressure in CUDA?

My acceleration hierarchies are allocated in linear memory and bound to a 1D texture at compute time. From reading posts here, it’s my understanding that linear memory may not play nice with the texture cache? Unfortunately, the hierarchy is too big to fit into a 1D array, and 2D arrays pose problems because either: pointers cost twice as much or must be generated at runtime with divisions and modulus operations.

Unfortunately, I haven’t found a use yet for shared memory other than for generating random numbers.

mfatica · April 1, 2007, 1:55am

When you compile, use the -cubin flag.

It will generate a cubin file that can be opened with your preferred editor.

For each CUDA function, you will find how many registers you are using

Massimiliano

bbudge · April 6, 2007, 5:38am

90M rays/sec is pretty good, but this is only for the cornell box scene. Depending on the scene, I see between 5 million rays/sec (on Stanford XYZ dragon with 7M triangles) to about 35 million rays/sec on smaller scenes like the Stanford Bunny.

I’m using a simple kd-tree like in Wald’s thesis, and I use 1D linear texture for lots of stuff, triangles, texture coordinates, the kd-tree itself, etc…

I also HAVE found use for shared memory, which made for quite a speed up… if you store your origins and directions in shared, you can index into them without funky tricks.

Brian

JaredHoberock · April 9, 2007, 8:32pm

Hmm that’s pretty good, I’m still stuck at around 2M/sec on the bunny. Are you tracing single rays or packets?

bbudge · April 10, 2007, 12:12am

Doesn’t make too much sense to ray trace packets on the GPU. It does help to order the rays in a block pattern to increase coherency though (I see about a 50% speed up on eye rays).

doomlord52 · April 10, 2007, 4:11am

well, I have had expirience with ray tracing, and i really doubt that it will EVER be in videogames. it just takes ot long. Maybe cut scenes, but not in-game…

for me, on my pc, (xps 600, Dual 7800 GTX) ray tracingcan take about 30 seconds (per frame) at 800 * 600, no AA, on Cinema 4D. I dont know if that program is efficeintr, but its ok for doing work…or screwing around :D

bbudge · April 10, 2007, 5:24pm

What kinds of scenes are you rendering? I can render the Stanford Dragon, which has 7 million triangles, a couple of times a second at 800x600. This is only dot product lighting, but still… Note that many game levels have 10s of thousands of triangles. I am seeing 30 frames/second ray tracing scenes like this, though I must admit there is no fancy stuff going on… no shadow rays, no antialiasing, no reflections. Which makes it pretty lame compared to OpenGL graphics :)

I agreethat ray tracing folks have several problems to solve before ray tracing could be considered real-time for gaming purposes, however, I believe it’ll happen in the next year or two. It’ll be a while before it is heavily used in games: My prediction :)

JaredHoberock · April 10, 2007, 6:51pm

Not the least of which is finding a compelling application of ray tracing to a gaming setting in the first place. In my opinion, tracing eye rays or reflection/refraction rays isn’t it, especially considering the well-established rasterization-based alternatives.

bbudge · April 10, 2007, 7:55pm

I’m not aware of any GOOD rasterization alternatives to refraction… can you point me to a reference?

I agree though, that most of these effects can be faked pretty well with current rasterization techniques.

JaredHoberock · April 11, 2007, 5:38pm

Yes, I am referring to the convincing fakes :)

acox · April 13, 2007, 5:52pm

How are you traversing the kd-tree? Did you implement the restart algo, or a short stack maybe? If you implemented a stack, how easy was it to get going? The paper linked above mentions great dificulties getting the short-stack variant to compile so they resorted to fixing up the generated native assembly under CTM.

JaredHoberock · April 13, 2007, 6:48pm

So far I’ve implemented a bounding volume hierarchy with static traversal order, kd-restart, and kd-shortstack. As it is, the BVH is the fastest, then kd-restart, then kd-shortstack. For me, register pressure seems to be the bottleneck, not memory latency.

bbudge · April 14, 2007, 2:29am

I’m using a small stack. I have something like this in my code:

local StackData stack[TREE_MAX_DEPTH].

My bottleneck is also register pressure.

Topic		Replies	Views
Raycasting performance on GPU CUDA Programming and Performance	13	6008	September 28, 2008
ray tracer choosing tools CUDA Programming and Performance	24	34296	May 20, 2008
CUDA and raytracing CUDA Programming and Performance	0	3596	August 23, 2007
Is there any performance difference implementing a ray-tracer in cuda vs. rendering pipelines? CUDA Programming and Performance	7	3027	March 2, 2019
CUDA 3D Rendering Mystery CUDA Programming and Performance	25	16372	June 16, 2010
Some newbie questions for raytracing with CUDA CUDA Programming and Performance	6	6833	April 25, 2008
Ray Tracing using persistent threads in CUDA CUDA Programming and Performance	1	855	March 22, 2017
cuda ray tracing speed CUDA Programming and Performance	1	6677	April 30, 2009
Porting my renderer from C++ to CUDA - the speed gains and their cost. CUDA Programming and Performance	3	11062	February 6, 2011
Best way of traversing an octree in CUDA? CUDA Programming and Performance	12	22533	July 7, 2009

Ray Tracing

Related topics