GFX card: NVIDIA GeForce GTX 280 @ 1.3 GHZ
KDTree construction with SAH (depth = 20) -> 1.3 sec (with is quite ok compared to optimized CPU solution)
Average performance of traversing -> 2.7 MRays / second
IMHO the performance of traversal is poor ;(
Currently i’m using push-down with shortstack traversal algorithm.
Stack is 8 items per thread, this gives 192 threads per block (shared memory size is soo small), each thread calculates one ray … and the whole thing suffers a lot from warp divergences
Packetized traversal eats to many registers and at the end result is worse
(i’v experimented with 2x2 packets)
after some tweaking & slashing i’v went from 2.7 MRays / second to ~12 MRays / second
(i’v reduced the number of if’s blocks to the minimum, some work is done redundantly now, but the divergence is much lower)
another thing i’v discowered is almost 100% texture cache miss when sampling KDTree branches, now i’m trying to rearange nodes table to be more cache friendly :)
for the sake of experiment i’v set camera fov to 1.0 (all rays should take almost that same path – no divergences) and in this case performance was ~70 MRays / second
all tests are done looking from the corner of sponza scene, so the whole atrium is visible to the camera.
I am also starting a ray tracing project and would like to know how you debug your CUDA code. I’ve tried setting up emuDebug, but in the C++ code, when setting up the D3D texture I get an error message saying “this feature is not yet implemented”. I also noticed in the CUDA SDK that the other D3D texture examples didn’t have any debug builds.
Not yet implemented usually means you have a bad toolkit/driver combo. Some of the stuff in the toolkit isnt in the driver.
Go to the cuda download page and download the most recent stuff there if you can.
with the latest driver & toolkit (i’m using x64 windows vista version) there is no problem with binding memory to textures on emu.
the other side of the stick is that some CUDA vs. DX9 interops do not work
(cudaD3D9(Register/Map)Resource and you need to emulate this by manualy locking the texture to get pointer and bind is as texture or pass to the kernel)
but that’s only few additional lines of code so you could live with this :)
Oops, just saw the declaration of rt. But would still be grateful if you could point out what I need to do to my code lol cos it’s a bit different to yours. Do i need to change it a bit so I can still lock the texture? It’s still not working even though I have upgraded the SDK/Toolkit and drivers