I am trying to implement a VR algorithm described in “A Method for Accelerating Bronchoscope Tracking Based on Image Registration by GPGPU”.
First, I run volumeRender sample program in cuda SDK.(C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\src\volumeRender)
My graphics card is GTX480. 4GB memory. Test database is (256, 320, 128) and window size is (512,512).
The computation time is 1000/11 = 90ms. It is much slower than the speed described in paper. In paper Fig.4, the described computation time for generating VB image is approximately 5ms when image size is (512,512). 90ms is about 18 times to 5ms.
Any anybody give me some hints about why cuda’s volume rendering is much slower?
It is in fact very slow, your timing is about right. The article you have referenced uses a different implementation tuned for specific kind of rendering. By the way, from my experience, multi-core CPU may provide a way better overall VR quality and speed then modern GPU; for small data-sets GPU is fine but once the size is getting bigger GPU gets lost. Just look for “CPU volume rendering” on youtube…