Understanding optixParticleVolumes

Hi,

I am doing a research project on the project optixParticleVolumes from CHAPTER 29
Efficient Particle Volume Splatting in a Ray Tracer fro mteh book ray tracing gems.

I am trying to understand how the algorithm works and wanted to see in someone here could say if I understand it correctly:

They have a ray generation program that splits each ray into equal sized slabs. Each slab then looks for particle intersections in its slab.
They find an intersection sample by looking at the length to the particle center from the ray shooting origin which should be from the camera. Then it checks if the ray from the origin to this length is inside the particle radius, then they take that sample as an intersection. I guess we could say they project the center of the particle onto the ray.

What I do not follow is where does the BVH traversal happen? Is it inside each slab or before?
If they split each ray into several slabs,then it looks like they call rtTrace for each slab, would that not give a much higher ray cast count because we need to shoot one ray with rtTrace from each pixel several times?

The bvh traversal happens within each call to rtTrace. Basically, the entry and exit points of the entire volume are calculated, then that interval (tenter, texit) is divided into N slabs. Each slab is traversed separately, in order. so first it rtTrace is called with an interval (tenter, tslab_boundary_0), results of that slab are processed, then we repeat for interval (tslab_boundary_0, tslab_boundary_1). repeat until we process all slabs.

Thank you for the response!

When you say the entire volume you mean the volume of the entire data set, the entry and exit points of the entire data set we are using?

Then we are dividing that volume into N slabs and each slab is traversed using BVH acceleration structure inside the rtTrace call?
But does that mean that we are shooting several rays for the same pixel? rtTrace is run for several intervals but along the same ray? Why not let the ray pass from the entry to the exit and process all data in one go?

Yes, the entire volume data set. You might have multiple volume data sets in a scene.

Yes, you are traversing multiple rays, but the rays have clipped intervals so portions of the data set outside of the slab are efficiently culled (yay BVHs). Slab spacing is a function of the particle-buffer-size and particle radius. If you set this buffer large enough, you will indeed traverse the entire volume in one go. however, this will require a large buffer-size and will have high memory overhead and suboptimal perf.

Sorry I am still a bit confused. What do you mean I am traversing multiple rays?

My understanding is that:

  1. rtTrace shoots a ray between a certain interval.
  2. We using the Optix built in BVH acceleration structure to find an intersection, and find all intersections inside this slab interval.
  3. Then it sorts and integrates these values.
  4. Then we need to run rtTrace again on the same ray and direction but do the work for the next slab interval

When one ray direction is finished we need to do it for the next direction etc.

What am I missing?

each rtTrace causes a ray traversal. you call rtTrace once per slab. so there is one traversal per slab.

So for each launch-index, you traverse one ray per slab (if you dont miss the volume entirely)

So was my understanding above correct then?

If rtTrace is called once per slab then we are calling rtTrace on the same ray several times but on different intervals?

And how is Optix iterating over all pixels in the screen, shooting rays through every pixel?

I really appreciate your help!

You are mostly correct above. You are calling trace on same ray-origin and ray-direction but, as you note, the interval is different. A ray is really made up of all of these components (plus ray flags).

Typically an optix launch is given launch dimensions which correspond to pixels. For instance, if your image is 1024x768 then you would invoke your launch as a 2D launch with 1024x768 grid. Then each launch index corresponds to a pixel. Take a look at our SDKs pinhole_camera to see this in action.

Great thank you very much! Ill take a look.

In the paper they talk about ray coherence, do you know what they mean by that? To have coherent rays?

Coherence can mean a couple of things. In the paper we are talking about executional coherence, meaning different rays (and therefore threads) are executing the same code together as well as data coherence – different threads accessing the same data.

Wow I didn’t realize I am getting help from one of the writers, amazing!

So with ray coherence the gain will be in the memory then, the data will be in the cache right?

Another question is regarding the PARTICLE_BUFFER_SIZE and particlesPerSlab, how did you know to set the default to 16 and that a slab wont include more than 16 partcles?

And in the Figure 29-2 in your paper that describes the algorithm, why are dashed lines drawn from the sample poit to the nearbound ray that does not intersect the particle?

Thank you so much for your help Keith!

Yes, there is potential for gain in coherent memory access, but often in ray-tracing algorithms (especially in path-tracing-like workloads, but here too) the biggest gains are in executional coherences. This means that ideally all threads in a given warp are executing the same code.

16 was arrived at empirically. The calculationf for slab size takes into account this number and the particle size to try to avoid overflows – but this can happen.

As for dashed lines – I dont have a copy of the article near me at the moment so I am trying to remember … I think that the dotted line shows the projection of the particle cebter onto the ray path. The particles have finite extent.

hope this helps.

Great, thank you for your time and response, it really helps!

Do you have any perception of how much thae ray coherence helps in your solution?

I don’t quite understand why overflow could happen. If you say that there are 16 particles per slab, do you mean that overflow could happen if there are more than 16 particles per slab in the actual dataset?