CUDA 3D Rendering Mystery

I have a tough but serious question for NVidia:

NVidia has owned Gelato for years now. They have been talking up the use of GPUs for non-real time rendering acceleration since the product was announced. Recently NVidia purchased Mental Ray. Both of these products rely heavily upon ray tracing yet CUDA and all of the NVidia GPUs are clearly not designed to accelerate ray tracing. Intel’s proposed Larrabee architecture, on the other hand, seems designed for ray tracing yet NVidia wasted no time in downplaying the technology.

So what gives? NVidia must have received tons of advice from its own engineers for years about what is needed to accelerate ray tracing. I would suspect that there is some major new technology that they have up their sleeves that, like Larrabee is designed for ray tracing acceleration but thus far they have not even hinted that it exists. I have not even read a “stay tuned” on the subject. Is there some basic disconnect within the hardware and the software engineers?

I am a big fan of CUDA in concept. It is great fun to program. It just happens to be of rather limited use for rendering. It can be used for volumetric calculations. I have a nice 3D noise generator written for CUDA. It just can’t seem to do much in the area of general purpose ray tracing in which any ray can go in any direction and hit any model in the scene. The bundled blocks of threads and memory performance pretty much rule this out.

I also am a big fan of NVidia and drool over all their new product announcements. I am just trying to understand why this company seems to be going in two opposite directions between their software rendering and hardware acceleration divisions.

-Mark Granger

Here’s an interview where David Kirk of Nvidia says why they don’t think raytracing is the future of graphics (or at least not immediate future).

I think he is talking about ray tracing for real time graphics and games. NVida has developed and aquired non-real time ray tracing programs. The problem is that CUDA (as the hardware exists currently) is not designed for ray tracing regardless of whether it is for real time games or non-real time rendering. That’s what has me so puzzled. Why the divergent directions between their GPU hardware and rendering software divisions?

-Mark Granger

I think you will find that there is no hardware designed for ray-tracing. Also ray-tracing is maybe not as interesting as HPC & 3D-games to NVIDIA. The new hardware is made for 3D-games (as always) and has had a lot of refinements that make CUDA a much better fit. You can still do ray-tracing in CUDA, even though it is not always super easy. As soon as you make your hardware optimized for ray-tracing you will lose in the HPC side. And I think the money for NVIDIA is not in the ray-tracing, but in HPC.

My unsubstantiated guess is that NVIDIA is hedging their bets. Raster graphics still sell most of the cards, so an architecture geared toward that will win them the most benchmarks. CUDA is an extension of that raster architecture, and still gives a good transistors/FLOP.

I imagine at some point ray tracing will become popular enough (or tasks similar to ray tracing like object tracking in a physics engine) that NVIDIA will decide it is worth spending some transistors on that. Keeping the software companies close by ensures they know the hardware requirements very well, and also gives them an immediate software demonstration of the technology whenever they decide to add it.

At least that’s my speculation…

Am I the only one who is doing a form of raytracing in CUDA? I am personally very happy with the performance I am getting (on GT200 hardware, my register-count is quite high, so this may make a lot of difference)

Nope, you’re not alone :sorcerer:

You see, it looks like the people who are complaining are actually not trying ;)

Well, ray tracing is possible, but people seem to be under the impression that it has to be considerably faster then CPU ray tracing, which in turn has been heavily optimized/improved during the last decade or so. People still need to learn in what way GPU RT can be realized. To my understanding one of the major problems is still the acceleration structure used during tracing the rays through the scene. There’s been quite a few papers on KD-Trees but they require recursion and hence make it hard to realize on the GPU (Foley et all. - Stackless KD-Tree, Zhou et all. Real-Time KD-Tree Construction on Graphics Hardware, etc). There are however some nice new concepts for acceleration structures that are not hierarchical and hence better suited for GPUs, ie Lagae and Dutré - Accelerating Ray Tracing using Constrained Tetrahedralizations.
So I think the G80 and successors can very well be used for ray tracing, it’s just a matter of finding more suitable acceleration structures for the GPU which is subject to current research.

Sort of off-topic: Anybody want to share thoughts on ray tracing with CUDA? I’m sort of curious how other people go about implementing them.

I use stackless KD-tree as a structure. But I am not doing ray-casting like usually done in raytracing. I am finding all possible paths between 2 points by reflecting off triangles.

I did not mean to imply that you could not do ray tracing with CUDA and current NVidia hardware. My only point is that is not designed to do it quickly. I was blown away by the speed of some of the CUDA demos and was very pleased with the performance of the Mandelbrot and 3D noise tests I did with CUDA. I also learned that I would not get the same kind of performance (GPU vs CPU) with ray tracing due to the harware design of the the current GPUs. I am not complaining though. I do think CUDA has a place in modern renderers and still find it exciting to program, just not for ray tracing.

For general purpose ray tracing, I suspect that Intel’s Larrabee may be a better fit. It is hard to tell for certain since there is very little technical documentation and no hardware available for testing. It will be interesting to see how Intel positions Larrabee since I feel it is best suited for offline rendering yet Intel seems to be talking about its use in real time games. It is sort of the opposite situation from NVidia. I am fairly agnostic on the battle between NVidia CUDA and Intel Larrabee since I feel both will have their uses.

-Mark Granger

But CUDA not getting the same speed as highly optimized CPU versions can also be a matter of lack of research up to now. It’s not like those CPU versions were born in 1 year :P
Also I am not sure if NVIDIA should optimize their GPU’s for ray-tracing, because it would likely mean less performance for parallel problems. The fact that they might be selling ray-tracing software does not chance that fact ;)
The point that I would also like to make is that when you get to the optimal performance (mem-bandwidth bound) you will not be faster on larabee than on GTX280 as far as I read it.


I am so glad to see people finally seeing and acknowledging the obvious lack of NVIDIA engineers and managers to grasp the 3D CAD real-time market that CUDA could open.

I had made a call over to NVIDIA about 9 months ago, and simply asked the question to David and his team; “Why wont NVIDIA write CUDA drivers for popular rendering programs?” The answer was; “what do you mean?”

I was actually floored… stunned… They weren’t aware of the world outside of games at all. When I tried to explain how a 3D program uses rendering with the CPU to calculate each little pixel, David among others were perplexed, and said they need to “check into this technology with another division of NVIDIA”.

I was amazed… How could CUDA be ran, developed and spawned by a bunch of gamer guys, without the real-world experience of the needs of 3D rendering applications. I still can’t believe it… I had just accepted it as a failure of the human race to evolve, up there with the Bush administration… I still can’t believe it…

Maybe some of these end-user ‘CUDA’ programmers out there can show NVIDIA how to write a driver for 3D rendering programmers so I can unlock the supposedly massive power of my NVIDIA video card that was highly optimized for 3D CAD in the first place?

Or better yet, I will learn how to smoke pot and become brain dead… Because at this point, it’s lonely up here…

Sincerely, Matt

I cannot understand the purpose of your post, but I can tell you that I have seen a very impressive demo doing raytracing under CUDA at NVISION.
They will release a API for people to make use of this raytracing technology. So anybody (any company with a 3D CAD program) can write an accelerator for their program using this API. My gut feeling tells me that this might actually already be happening.

So it looks like David and his team have not been sitting on their butts in the past 9 months.

If you examine CUDA carefully, consider a lot of the design decisions that went into it, and examine the early PR for CUDA (featuring John Stone’s and other’s work at UIUC) you will find that CUDA was tailored specifically for the HPC market. This is, of course my opinion, but others share it.

Have you even checked the literature? This has been done. 3D rendering isn’t my area so I can’t point you to any specific works off the top of my head, but in my literature reviews finding sources for my own papers turned up several works implementing raytracing and other redering techniques using CUDA.

In fact, this is the kind of thing that NVIDIA has hoped for from the beginning! They have been very active in supporting CUDA, reaching out to Scientific researchers, Mathematicians, video encoding developers, computer vision developers, … and the list goes on and on. You can’t possibly expect NVIDIA to have the expertise to use CUDA to implement all of these applications themselves, can you? They provide CUDA, the hardware, and a lot of support and us, the “end-users”, as you say are supposed to develop applications that use it and publish papers & code. I know many of the NVIDIA guys who worked on CUDA and they are quite ecstatic to see the kind of response the community has given! Have you checked out the CUDA Zone? NVIDIA is even taking advantage of all the great development going on with CUDA for marketing, and also supporting us by providing links to our work.

Finally, as E.D. Riedijk pointed out, NVIDIA has developed their own real-time raytracer using CUDA. Were not not aware of that, either?…raytracing.html

Simple answer:

Ray-tracing is not parallel-friendly.

There are serial codes, there are parallel codes. There are serial CPUs and parallel GPUs. GPUs are not fundamentally better than CPUs, they’re fundamentally better at what they’re good at. To make GPUs better at raytracing would mean to take away from what they’re optmized for.

If Intel is going in this direction, it is only because they don’t have any experience creating optimized GPUs, are taking this route because it’s easier, and are trying to spin that into a positive. Whatever larabee 1 looks like, larabee 2 will work a lot more like a real GPU.

A more subtle answer is the one given by Linny. Ray-tracing can be made more parallel-friendly, but it will take new algorithms. People are actively working on this.

P.S. To wax philosophically, the whole point of ray-tracing is that it’s an unoptimized algorithm. That’s not an insult. It is easy to work with, allows for beautiful images to be created without uber geniuses thinking of brilliant tricks to get them. It is well-suited to CPUs. Once you start to opimize ray-tracing two things happen. One, you take away from its power and expressivity. Two, it becomes a lot more like its cousin, rasterization.

Sorry, the point of ray-tracing was never, really, speed. And besides, the whole performance thing is so self-inflicted anyway. If you tweak a couple variables, you can change render time 10 fold. Just refrain from tweaking them the wrong way, and you’ll get to see your work the same day you make it. If someone gives you 100x the power, I’ll bet you’ll find a way to tweak those parameters to make rendering take three dozen hours again.

Actually, I’m not sure what you’re saying. 3D CAD is served by NVIDIA’s Quadro line of workstation graphics. Don’t get me started on the bs that it is, but I think you’re not using your words precisely.

Ga-whaaaaaaa? There is a canonical example of an embarrassingly parallel problem–it’s raytracing!

Besides, any decent programmer can write a raytracer in the most naive way possible, but fast raytracing uses optimized data structures which don’t affect output quality, just performance.

Indeed. The strong support of 64-bit Linux architectures early on definitely suggests that HPC was a major focus.

I guess I meant SIMD friendly, and yes, in reference to the optimized versions (since ‘friendly’==‘performance’).

‘Parallel’ of course means many things, but raytracing just doesn’t quite have the qualities of rasterization that make it possible to create extremely optimized hardware.

Frankly, I think raytracing on GPUs is a dead end, even if it’s feasible. The whole point of raytracing, as I’ve said and everyone reiterates, is that it’s easy. And that actually has a big affect on image quality.

Just take look at the now-defunct Gelato.