The resulting raytracer runs in real-time (around 10-20 times faster than the OpenMP/C++ version), and has the following features:
[*]Real-time raytracing of triangle meshes - my 70$ GT240 renders a 67K triangles chessboard with Phong lighting, Phong normal interpolation, reflections and shadows at 15-20 frames per second. Interactive navigation and rendering mode changes are allowed (see video at my page, linked above). Overall, compared to the pure C++/OpenMP version, the CUDA implementation runs 10-20 times faster.
[*]A Bounding Volume Hierarchy using axis-aligned bounding boxes is created and used for ray/triangle intersections. The BVH is created via the surface-area heuristic, and is stored for fast re-use. If SSE are detected during compilation, a SIMD implementation is used that builds the BVH faster.
[*]CUDA 1.2 cards like my GT240 have no support for recursion, so I used C++ template magic to implement compile-time recursion - see cudarenderer.cu in the source tarball for details.
[*]C++ template-based configuration allows for no-penalty runtime selection of (a) specular lighting (b) Phong interpolation of normals (c) backface culling (e.g. not used in refractions) (d) reflections (e) shadows (f) anti-aliasing.
[*]Z-order curve is used to cast the primary rays (Morton order) - significantly less divergence => more speed.
[*]Vertices, triangles and BVH data are stored in textures - major speed boost.
[*]Screen and keyboard handling is done via libSDL, for portability (runs fine under Windows/Linux/etc)
[*]The code is GPL, and uses autoconf/automake for easy builds under Linux. For windows, the required MSVC project files are included, so the build is just as easy (see instructions at my page, linked above).
The good news: It built fine on Ubuntu 9.04 with CUDA 2.3 toolkit.
The bad news: The chess scene only renders at 3 FPS using a Compute 1.1 card (32 shaders)
(II) Feb 07 11:30:58 NVIDIA(0): NVIDIA GPU Quadro FX 580 (G96GL) at PCI:1:0:0 (GPU-0)
(–) Feb 07 11:30:58 NVIDIA(0): Memory: 524288 kBytes
It seems that the switch to Compute 1.2 cards like the GT 240 can provide a significant boost. It could be that the smaller register file in Compute 1.1 leads to register spills to local memory. Or it’s the better memory controller logic in 1.2 devices and better that leads to the improved performance.
Haven’t figured that out yet - I will try turning on the verbose output in PTXAS to see what’s going on.
… and someone happened to stroll by my open workstation with a GTS 450 in their hand. That renders the chessboard with default settings at 19 fps. (Again, the GTS 450 is not the display card.)
I get (0.094968 fps) on a core 2 duo laptop running on Ocelot’s PTX to x86 JIT, rendering the chessboard. I wonder how this would compare to the OpenMP version?
“Rendering 15 frames in 157.948 seconds. (0.094968 fps)”
Also, do you mind if I use your code as a benchmark for research into compiler optimizations? What would be the best way to cite your implementation?
EDIT: That result was for -O0 -g. The result for -O3 optimization is:
“Rendering 20 frames in 59.228 seconds. (0.337678 fps)”
Part of the reason I published it under the GPL is to make sure that it can be easily used (and extended) for academic research. By all means, use it, Gregory.
P.S. Use this for citation:
“A real-time raytracer of triangle meshes in CUDA”, Thanassis Tsiodras, Dr.-Ing, Feb 2011.
Very good your raytracer! congratulations! but I am having the following error when trying to compile (when i do make) it on linux:
Utility.cpp: In function ‘void panic(const char*, …)’:
Utility.cpp:34: warning: format not a string literal and no format arguments
CXX cudaRenderer-BVH.o
CXXLD cudaRenderer /usr/bin/ld: skipping incompatible /opt/cuda/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: cannot find -lcudart
collect2: ld returned 1 exit status
make[2]: ** [cudaRenderer] Erro 1
make[2]: Saindo do diretório /opt/renderer/cuda-renderer/src' make[1]: ** [all] Erro 2 make[1]: Saindo do diretório /opt/renderer/cuda-renderer/src’
make: ** [all-recursive] Erro 1
I’m using CUDA Toolkit 3.2 x32, Ubuntu 10.04 x64 and a GTX580 when I try to compile with the CUDA Toolkit 3.2 x64 got the same error.
I wish I could help - but I only have access to 32bit Linux environments.
The problem is clearly manifesting because of 64bit: the message “skipping incompatible” means that the linker found a 32bit cudart library, but couldn’t use it.
If you can’t use a 32bit building environment, you may be able to cope by adding “-m32” to the compiler/linker flags (to specifically request generation of a 32-bit binary).