Help with Volume Rendering & Octrees

Hey,

I am new to CUDA and I am trying to build a volume renderer. I have looked at the sample from the SDK as a starting point.
I am currently rendering volume data from a .raw file (one of the free ones available on the internet) very similiar to how the SDK renders “bucky”.
The next step for me is to implement an octree to help speed up my program.
However I am not too sure how to go about this.

I have an octree structure and some functions within my (only) CUDA file (ok to have it in this file?) but there are two things I don’t understand that I hope somebody can help with.

  1. To build an octree you need to give it the positions (x, y z) in 3D space of data that you want space subdivided around. If I have a .raw file (256x256x256) how can I get the coordinates to do this from my 3D texture that holds the .raw file data???
    The only function I know of is tex3D(…) but you need to give it the x,y,z values in order to sample the texture. I want to do this later with the actually x,y, z values of the data in 3D space after Ive built an octree.

  2. If I build and use the octree within the CUDA file (.cu file) how can I visualise the tree with for example OpenGL as I can’t call OpenGL code within the .cu file…or can I???

I hope I have explained this sufficiently.
Thanks.

From the coordinates you specify to tex3D(). See the programming guide. Study it and the related samples.

Find here a lot of information, about the needed logic, here - very nice paper:

http://www.voreen.org/files/sa08-coursenotes_1.pdf

A .cu file can contain both Host code and GPU code. Host code is just standard C/C++ code, and you can call OpenGL functions - see the samples and the programming guide again.

If you take a look at the coordinates used in tex3D() in the SDK volume renderer example they are the sample points of a ray. This is done for every pixel.
What I want to do is build the octree with the data then do the ray casting step.
The problem is what x,y z coordinates does each data point have in the 3d space.
Knowing that I can then build an octree to house them.
Then having an octree I can do a more efficient ray trace by only sampling at the given data points locations.

You should be aware about code path divergence issue for octree traversal; it is an issue even for neighboring rays if you go-down into inner-cell subdivision levels (it is the only way to have a good rendering quality for volumetric ray tracing/casting (high quality volume rendering)). In fact, this intrinsic SIMD limitation of today’s GPU was the “deal killer” issue for my development of volumetric ray-tracer under GPU/CUDA. New truly MIMD GPU must have no such limitations; unfortunately it is not enough to call things MIMD it should be an actually MIMD; well, will see…

Yes, looking up the texture in this way automatically interpolates the 3D points of the volume data on the path of the rays. Coordinates are normalized to [0:1] because of the line in volumeRendere.cu

transferTex.normalized = true;

Set “normalized” to false and you will be able to use the original data coordinates in [0:n-1], as stated on page 29 of the programming guide.

In case you have not noted it, this thread could be very interesting to you:
http://forums.nvidia.com/index.php?showtopic=95710

Thanks sigismondo I didn’t see [0-width], [0-height], [0-depth] thats what I was trying to clarify. Great! :thumbup:

However I am still stuck trying to implement an octree, the concept of an octree is fine with me but CUDA gives out about recursion, pointers in GPU functions etc

Based on the other thread [topic=“0”]here[/topic] the best way is to store Node data in a texture, can someone give a more detailed explanation on this? It is making me :wacko:
Or does anyone know of anywhere were there is a simple octree cuda example???

Actually they are not forbidden. Yes divergence will be a problem, but I have read some papers about the fact that close rays will follow close paths (I mean octree traversal paths), reducing the serialization involved by divergence. So you need to cast rays as close as possible to each other, i.e., proceeding by blocks and not by rows of pixels.

I guess this is because texture memory is fast, However data won’t be interpolated… you need to store pointers there!

Try googling around: there is a lot of material on this topic - I really enjoy CG, and volume rendering techniques give astonishing results - I was an Amigist ;) in its era. Probably I will give a try to a volume renderer to sell to Electronics Arts… :D

Good luck!

I have read some papers about the fact that close rays will follow close paths (I mean octree traversal paths),

reducing the serialization involved by divergence. So you need to cast rays as close as possible to each other,

i.e., proceeding by blocks and not by rows of pixels.

Can you please provide the reference to the “papers” you’ve mentioned? Once sampling density along ray is higher (x4+) then rays-density the divergence is apparently inevitable even for neighboring rays. Definitely, for the opposite case the coherency can be substantially better but rendering quality would be bad especially for IC & mid-high opacity with gradient lighting - for this case at least 4 samples per cell should be maintained even for relatively mediocre-competitive quality. Anyway, the references to “papers” are appreciated.

Thanks,

Stefan

From here:

http://www.mpi-inf.mpg.de/~guenther/BVHonGPU/BVHonGPU.pdf

Actually you are right: basic closeness is not enough - I was remembering it was easier.

Another very good reference:

http://artis.imag.fr/Publications/2009/CNLE09/CNLE09.pdf

using more-than-oct-trees i guess more threads will reach the same leaf (not 2^3, but N^3), and it will be better computationally.

Don’t loose this very valuable NVidia presentation:

http://www.nvidia.com/content/nvision2008/…Ray_Tracing.pdf

From where I did arrive to the first one.

The only pity is that I have not tried this stuff yet.

One note: I did not want to dig it so deep, since Thomas’ question was about how to basically implement it on the GPU, that is, not much different from the CPU version.
In the references you will find more and more details, but after all these are optimizations. To start to implement an oct-tree just you do not need to care of them… you will when you will see that “it is slow” :D “Premature optimization is the root of all evil”.

Can you Steven give some advice - some differences you have encountered implementing it, since you have already done it?

Wow thanks guys, thanks so much for your help. You guys helped me get over the hurdle of getting it all together in my head.
I am currently implementing everything so will let you know how I get on soon

I am having a difficult time trying to add any opengl draw functions into my code. I use a pixel buffer object just like the volume render sample and I have tried to use a vertex buffer object to be able to draw lines around my octree. However it is not displaying. Do i need to join both vertex and pixel buffers? Simply adding in the draw code into the opengl Display() code wont work either, am I missing something?

Ok if I dont draw the Pixel Buffer Object I see my OpenGL lines, how can I show both at the same time?

EDIT: Ah never mind, drawing pixel buffer object first, then vertex buffer object does the trick.