The resulting raytracer runs in real-time (around 10-20 times faster than the OpenMP/C++ version), and has the following features:
Real-time raytracing of triangle meshes - my 70$ GT240 renders a 67K triangles chessboard with Phong lighting, Phong normal interpolation, reflections and shadows at 15-20 frames per second. Interactive navigation and rendering mode changes are allowed (see video at my page, linked above). Overall, compared to the pure C++/OpenMP version, the CUDA implementation runs 10-20 times faster.
A Bounding Volume Hierarchy using axis-aligned bounding boxes is created and used for ray/triangle intersections. The BVH is created via the surface-area heuristic, and is stored for fast re-use. If SSE are detected during compilation, a SIMD implementation is used that builds the BVH faster.
CUDA 1.2 cards like my GT240 have no support for recursion, so I used C++ template magic to implement compile-time recursion - see cudarenderer.cu in the source tarball for details.
Z-order curve is used to cast the primary rays (Morton order) - significantly less divergence => more speed.
Vertices, triangles and BVH data are stored in textures - major speed boost.
Screen and keyboard handling is done via libSDL, for portability (runs fine under Windows/Linux/etc)
The code is GPL, and uses autoconf/automake for easy builds under Linux. For windows, the required MSVC project files are included, so the build is just as easy (see instructions at my page, linked above).
The good news: It built fine on Ubuntu 9.04 with CUDA 2.3 toolkit.
The bad news: The chess scene only renders at 3 FPS using a Compute 1.1 card (32 shaders)
(II) Feb 07 11:30:58 NVIDIA(0): NVIDIA GPU Quadro FX 580 (G96GL) at PCI:1:0:0 (GPU-0)
(–) Feb 07 11:30:58 NVIDIA(0): Memory: 524288 kBytes
It seems that the switch to Compute 1.2 cards like the GT 240 can provide a significant boost. It could be that the smaller register file in Compute 1.1 leads to register spills to local memory. Or it’s the better memory controller logic in 1.2 devices and better that leads to the improved performance.
Haven’t figured that out yet - I will try turning on the verbose output in PTXAS to see what’s going on.
Very good your raytracer! congratulations! but I am having the following error when trying to compile (when i do make) it on linux:
Utility.cpp: In function â€˜void panic(const char*, …)â€™:
Utility.cpp:34: warning: format not a string literal and no format arguments
CXXLD cudaRenderer /usr/bin/ld: skipping incompatible /opt/cuda/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: cannot find -lcudart
collect2: ld returned 1 exit status
make: ** [cudaRenderer] Erro 1
make: Saindo do diretÃ³rio /opt/renderer/cuda-renderer/src' make: ** [all] Erro 2 make: Saindo do diretÃ³rio /opt/renderer/cuda-renderer/src’
make: ** [all-recursive] Erro 1
I’m using CUDA Toolkit 3.2 x32, Ubuntu 10.04 x64 and a GTX580 when I try to compile with the CUDA Toolkit 3.2 x64 got the same error.