Query on Optix Baking / Texture / Data storage (Basic query!!)

Hello all,

I apologise in advance for the general nature of my query, and if anyone is kind enough to answer, I am not looking for specific answers, just direction for further investigation.

I am not an experienced computer programmer, but have been researching/exploring use of CUDA and Optix in various areas, but I cannot find information on some specific topics.

Most of the Optix information I have found relates to photorealistic raytracing rendering of images of geometry in a fast, almost real-time, sense.

I have not come across implementation, or code samples, regarding baking of models - an area that I read Optix as being quite useful (a la Bungie’s Vertex Light Baking / HBAO or Least Squares Vertex Baking)
My interpretation of baking is that a complete ‘static’ scene is analysed via raytracing (hence Optix is preferable) and (probably) texture maps are updated/written with new colour values.
The scene can then be recalled with the baked texture maps applied for further ‘real-time but simpler’ shading analysis for a more dynamic elements (pre-baked gaming environments etc.)

There seems to be little in the way of actual implementation of baking available online (save for https://github.com/nvpro-samples/optix_prime_baking)

My research relates to an analysis of, and a single complete solution for, a scene, whereby the scene is analysed with millions of rays, and the ray hits ‘for each polygon’ are counted, and stored.
It is a view-independent solution, with no need for pixel-based calculation, just polygon-based calculation and polygon-based data storage.

The single solution for the entire scene is to be stored for viewing/recalling at any time (by way of a simple passing of the model to OpenGL) and the resulting scene would show a different colouring of each polygon representing the number of hits per polygon.

I see similarities to the Collision SDK sample in that the geometry is analysed ‘off screen’ and passed back, but my scenario is not completely the same (i.e. no real-time updating, there is ONLY one answer after analysis)

The two issues I cannot find info for are:
(i) the updating/storage mechanism counting the ray hits per polygon (not per pixel) and
(ii) the creation/storage of texture data that translates hits to a colour, again per polygon (not per pixel), which would be wrapped on the model.

In my scenario, any hit programming is the basis (with no standard recursive ray types - it’s almost just a ray casting exercise).
Obviously OBJ is the geometry type used (I have already automated the scripting of OBJ files for testing), and I assume Vertex Buffer Objects and vertex attributes are relevant.
But the ideas of
(i) Optix storing/counting hits
(ii) Optix ‘writing’ a re-usable/re-loadable texture
are either uncommon (I doubt it) or the ideas are simple and/or obvious but have passed me by in their simplicity.

Any direction would be very much appreciated.

Thanks,
Alan

“My research relates to an analysis of, and a single complete solution for, a scene, whereby the scene is analysed with millions of rays, and the ray hits ‘for each polygon’ are counted, and stored. It is a view-independent solution, with no need for pixel-based calculation, just polygon-based calculation and polygon-based data storage.”

If the solution is like baking, you normally start rays from the polygon into the scene. In the case of ambient occlusion the number of hits vs. miss events is used to darken the source position.
That’s a view independent solution which means it only applies to diffuse isotropic BRDFs. Ambient occlusion is a special case of that with all materials being white Lambert BRDF under a white environment light.

If a process like ambient occlusion baking is what you need to do to count your per polygon hits, you would normally start your rays from the geometry. Your launch grid would need to map to the polygons instead of camera pixels and integrate hits over the upper hemisphere above that polygon.
Means your ray generation would need to know where on the scene geometry to start and how to generate rays.
That can be a simple progressive algorithm adding or accumulating the results into your desired output. For example, you could even do that for one polygon at a time, but that would potentially be slow depending on the complexity of the scene,

With ambient occlusion baking you could store the resulting information inside color per vertex information. Ambient occlusion is normally rather low frequency. If the geometry is coarse you could still miss occluded areas on bigger triangles.

To overcome this you could also bake this information into textures. For that you could map textures with a disjunct texel mapping onto your geometry. Means texels cannot be reused on different geometries! It could be a texture atlas though.
Then you would need to calculate the texel colors (your hit count) by integrating the hemisphere over each of the texels. That would be the brute force method and could take quite some time if done for every texel. As said ambient occlusion is rather low frequency, so the texture resolution doesn’t need to be that high. The end result would be an ambient occlusion texture map.

Hybrid methods with fewer start positions and interpolation onto the texels would be possible as well.

If you want to color these results to indicate hits per polygon, you could use the results as lookup into another texture map, e.g. like a temperature gradient to get false colors or iso-lines of the counts.

You weren’t completely clear about what kind of hits you actually want to count.
If instead of that hemispherical one level integration of hit or miss information you actually mean to follow some ray path through the scene and count hits on the destination polygons, those results would need to be written into a buffer which is indexed by polygon and you would need to use atomic add operations to do these scattered writes because the launch index doesn’t match the result index.

A mapping of these hit positions on a polygon to a texture coordinate can be done with the barycentric coordinates you calculate inside the intersection program. Means to accumulate hits on polygons in textures, polygons, resp. the materials on them, would need to know which output buffer they need to write to (which texture data), and accumulate the result per texel location you calculate via the baycentric coordinates and do the atomicAdds on that.

Hi Detlef,

Firstly, of course thank you very much for the time spent in your very detailed answer.
You have assumed more complexity than even I can understand in parts…!!

I used the baking method as a more complex reference to what I had planned…

My scenario is simply a ‘line-of-sight’ simulation with a moving viewing position. As the viewpoint moves, specific model polygons are checked for visibility versus obstruction/blocker polygons.
Hence no ray recursion, no baking, no ambient occlusion - its nowhere near as complex as that.
A simple ‘any-hit’, or ‘miss’ program from viewpoint coordinate to model polygon will yield a ‘viewable’ or ‘not viewable’ model polygon (based on blocking polygons) at that viewpoint, and increment appropriately.


‘you normally start rays from the polygon into the scene’…
‘Your launch grid would need to map to the polygons instead of camera pixels’
‘Means your ray generation would need to know where on the scene geometry to start and how to generate rays.’

Yes, that is the plan, loading the OBJ scene polygons, tracing from the polygon to the viewpoint coordinate, and running lots of those rays, for multiple viewpoints (maybe 3000+ viewpoints and 200,000+ model polygon elements)
There is no need to integrate over the upper hemisphere as its not a lighting calculation.

‘That can be a simple progressive algorithm adding or accumulating the results into your desired output.’
Yes, a progressive algorithm accumulating the results is the goal. Most likely for every polygon per viewpoint, to avoid race conditions on the same polygon.
AtomicAdd is appropriate of course if race conditions arise - as you mentioned regarding the scattered launch index

Without an ambient occlusion goal in mind, the accumulation of hits is still the most important point.



Even when reading my post and then your post and my re-post (up to now), it has given me some inspiration (so thank you again Detlef!!)

Inspiration => I actually don’t need to use polygon intersection hits at all.
I can just use ‘viewpoint coord’-to-‘geometry vertex coord’ rays and if I get a ‘miss’, then increment a separate counter/buffer for each vertex (a ‘miss’ means nothing the way, so the vertex is visible to the viewpoint).
The problem scenario does not require specific location of a hit on a polygon, just whether it is hit/miss.

Interpolation between vertices could help in the colouring of the solution.

I have taken a copy of the ‘optix_prime_baking’ sample from GitHub…
It is probably far too complex, but I will try to glean as much as possible from this…
As I said previously, I have a relatively simplistic scenario (just firing lots and lots of rays) and the basics may have passed me by…

Again many thanks for your time…
Alan

I see, that’s a little different, but depending on the desired accuracy can still be quite involved.

“I actually don’t need to use polygon intersection hits at all.”

Right, you don’t need the closest_hit result for pure visibility tests. An any_hit program is sufficient. It’s just like shadow rays which check if there is anything between the ray tmin and tmax values.

Note that you cannot count the number of polygons in between with any_hit programs for some of the provided acceleration structures in OptiX, because for example SBVH and TRBVH are splitting, means identical triangles can appear in multiple but smaller volume bounding boxes and would be counted multiple times depending on the ray direction through these bounding boxes.

“I can just use ‘viewpoint coord’-to-‘geometry vertex coord’ rays and if I get a ‘miss’, then increment a separate counter/buffer for each vertex (a ‘miss’ means nothing the way, so the vertex is visible to the viewpoint).”
The problem scenario does not require specific location of a hit on a polygon, just whether it is hit/miss.

Two things to consider with that:

  • Make sure to avoid self intersections. If the view point is not on a surface it’s more robust to start the ray from the polygon with an epsilon and shoot to the eye position than the other way around (shooting from the eye to the polygon with the ray’s tmax shortened by some epsilon).
    (Tip for developers implementing area lights with geometry and shadow rays: You need to apply the epsilon on both ends of the shadow ray to avoid self intersections on the receiving surface and the emitting geometry.)

  • The standard intersect_triangle routine in OptiX is not watertight, for performance reasons. Hitting a polygon at the vertices or at the edge between adjacent triangles can result in a miss!
    That’s not really a problem in your case, since you’re not actually intersecting at that location on the destination geometry. Still keep that in mind because this can also happen for the polygons along the ray. You cannot decide visibility reliably at these locations with single rays.
    You could sample a lot of rays uniformly over the whole polygon and count how many are visible to get a proper coverage result, or use a watertight triangle intersection routine.

Hi Detlef,

Apologies for not registering your second reply, I have been on vacation (and unfortunately back home now)
Many thanks again for your second post, another wealth of information, very much obliged…

I had discovered Optix Prime straight after my last post, and I do think this could easily be a good solution, but hopefully some this restrictions compared to pure Optix don’t hinder the process…
If I may I will re-comment back on your previous posts, and some of your first post has clicked

If the solution is like baking, you normally start rays from the polygon into the scene. In the case of ambient occlusion the number of hits vs. miss events is used to darken the source position.
That’s a view independent solution which means it only applies to diffuse isotropic BRDFs. Ambient occlusion is a special case of that with all materials being white Lambert BRDF under a white environment light.
If a process like ambient occlusion baking is what you need to do to count your per polygon hits, you would normally start your rays from the geometry. Your launch grid would need to map to the polygons instead of camera pixels and integrate hits over the upper hemisphere above that polygon.
Means your ray generation would need to know where on the scene geometry to start and how to generate rays.

This is exactly the scenario I have, the vertices of the geometry to analyse are the sources of the rays origin (not the pixel locations). Ambient Occlusion is a very similar scenario with a difference being that the rays need to be directed from each vertex are to specific locations following the path of the moving line-of-sight position (not randomly into the hemisphere above as AO does)
The basic loop is
-For each vertex on the geometry to analyse
–For each step in the path of line-of sight
—Check if the line of sight is not obscured by surrounding geometry / occluders

if clear, increment a counter for that vertex
–Next
-Next
Display the model with all triangles coloured based on vertex counter values
[Simple :) ]

That can be a simple progressive algorithm adding or accumulating the results into your desired output. For example, you could even do that for one polygon at a time, but that would potentially be slow depending on the complexity of the scene,

I would be comfortable calculating this this per vertex of geometry considering I have maybe 5000 points on the sight position path, i.e. each geometry vertex is analysed approx. 5000 times against the surrounding geometry, and there maybe 100,000+ vertices to analyse.

If I spawn those 5000 rays for ‘one vertex at a time’ and try to count the hits on the surrounding geometry, am I not running straight in AtomicAdd for every vertex, because I will have all rays running at the same time and trying to increment the hit count for that vertex at the same time?

With ambient occlusion baking you could store the resulting information inside color per vertex information. Ambient occlusion is normally rather low frequency. If the geometry is coarse you could still miss occluded areas on bigger triangles.

Yes storing information per vertex is the idea I think suits best, the geometry triangles are all similar sizes so I don’t think I need texel implementation.
I don’t think the information to be stored is colour though, but the ‘miss’ count actually (the opposite of hit count).
I don’t need to count hits on destination/occluding polygons and I don’t need closest_hit values either
It is the reverse of PrimeSimple idea, the source vertex needs to store the hit, not the actual polygon that is hit…

Barycentric texturing of the results would be the idea in final on-screen display of results, whereby the triangles based on the source geometry vertices are coloured/shaded (barycentrically) to represent high/low visibility, for example 90%+ visibility is shaded green , 10%+ visibility is shaded red and every value ‘in between’ is shaded an ‘in-between’ colour…

Can I assume by passing the vertex buffer to OpenGL using OpenGL interop, that this could handle the shading interpolating colour values based in the hit counts from the per-vertex hit count data?

Simple visibility is exactly like a shadow ray here, and not having a watertight intersection is not a problem.

Self-intersections could be a problem in another way, in that the rays could be going backwards in the geometry itself.
Because this is view-independent analysis of ‘solid volumes’ (I know they are not solid, but a completed surface envelope) and not rays to/from an image plane, the rays from any vertex could technically be sent inside the ‘solid’ volume of the geometry and obviously hit a triangle coming out the other side of the volume – this is not a real hit versus the surrounding geometry, this is a false positive hit.
It is a common approach to possibly compare the ray direction to the triangle normal direction, and if the angle between is more than 90 degrees then disqualify that ray, or is there a way of testing if the ray intersects a BVH of the volume itself and then disqualify that ray?

You have given me lots of information already, so don’t feel the need to reply in-depth or “at all”

Many thanks
Alan

Sorry, for the late reply. I was presenting on the GPU Technology Conference last week.

Shooting 5000 rays per 100,000+ vertices in a single launch will most likely run into a timeout (TDR) under Windows Display Driver Model (WDDM) when running on a GPU device with that driver model. Running on a Tesla board in TCC doesn’t have that timeout.
To prevent that, do less work more often in a progressive approach.
I would recommend to start shooting 100,000+ rays, one per vertex in your model, and progressively accumulate the hit or miss count until you reached the desired count of 5000 visibility checks.
With that approach you can easily increase or decrease the number of visibility tests simply by changing the number of launches without the risk of ever running into a timeout.

Yes, OpenGL allows unsigned integer attributes as well, which could hold your count and would then need to be converted to whatever coloring you like inside a GLSL vertex shader to get intepolation on fragment side.

There is no method to use the BVH traversal information, since you do not have access to any of that internal data.
Checking front- or backface cases should be sufficient. If your model’s vertex winding is consistent, you can get a front face geometric normal from that easily.
You get that geometric normal back from the OptiX helper function intersect_triangle() you’d use inside intersection program anyway.
With OptiX Prime you would need to calculate that yourself, but that’s simply a normalized cross-product of two triangle edges.

A simple statement like this when shooting from the triangle would keep rays on the front face hemisphere. (Both vectors need to be in the same coordinate space.)

if (optix::dot(ray.direction, geometric_normal) <= 0.0f) // Ray is edge-on or below geometric surface?
{
  ... // Add code to handle that to your liking.
}

Hi Detlef,

Again another huge wealth of insight & advice, I am very much obliged…

I had some simple success (which I posted on another bumped thread in the forum) with Prime coding…

I will hope to continue on this without resorting to you and the forum…

I do think I will back in a while though for some basic advice on launching arrangements/considerations, but hopefully only small issues…

Again, many many thanks for your time and feedback (when I assume you are super-busy!!)

Alan

Hi Detlef,

I hope you are well…

I have progressed to the launching research issues a bit quicker than I had imagined.

I have been analysing the nvpro optix_prime_baking sample via GitHub, front-to-back and back-to-front…
I don’t know if you are familiar with the specific inner workings of this program though…
Mesh instancing is incorporated in this sample which does another level of looping across the program, but I have tried to block this out of my own analysis.

From what I can gather (in very simple terms) every vertex of the scene is sampled a number of times using an cosine-hemisphere ambient occlusion analysis, samples are accumulated and normalised and then all passed to OpenGL (to spin around!!).

Scene VertexCount * AOSamplesPerVertex = AOSampleCount (in terms of rays required)

The construction I see that is most relevant for me is in the preparation, accumulation and execution of AO sample rays across the whole scene (bake::ao_optix_prime() in bake_ao_optix_prime.cpp).
My interpretation is that there is an AO sample batch construction applied whereby a fixed batch size (hard-coded as 2,000,000) of rays is executed as a batch, and if the required AO sample number is larger, then go ahead that a loop process of:

  • host>device copying, using CUDAMemCpy
  • a double CUDA process of rays execution and then AO calculation/normalisation
  • device>host copying
    and then move onto the next batch.

The default program configuration without any command line arguments yields 1.5million+ samples, but with different samplespervertex (e.g. -t “16”), I have increased the total sample count up over 6.5million samples.
I assume this would then be processed in 4 batches? Would this seem a reasonable assumption of the workings of the batches…??

Within the kernels.cu code, there is also a hard-coded block_size = 512 which I understand is different to this batch size and based at the core CUDA idea of threads, blocks and grids…

Also from my analysis, the preparation of AO samples is done sequentially per vertex, e.g. if there are 16 samples per vertex, then the program allocated these in sequence of vertices encountered, e.g. vertex 1 creates samples 1-16, vertex 2 => samples 17-32, vertex 3 => samples 33-48 etc etc. and then they are all fired as per the batch construction above.

The second CUDA process (UpdateAOKernel<<>> in bake_kernels.cu) is a basically float-based qualifier of the hit result (if ‘hit’ then ao_sample = 1.0f, if miss then ao_sample = 0.0f).

The normalising of the ao samples is a barycentric operation (filter_mesh_area_weighted() in bake_filter.cpp) which (unless I’m mistaken) is not a CUDA operation, it’s just a big for_next loop on the host side… unless by reference to the optix namespace it is processed on the device by default (but I don’t think so)??

And also I thought it might have been a very obvious decision to use CUDA in some way to normalise the AO values on each vertex, rather than using a big for_next loop on the host… Might you have any opinion on this??

So after all that review, it brings me to my real point…
In your previous post, you suggested NOT to run (my worst case scenario of) 5000 visibility check per vertex, sequentially on each of the 100,000 vertices but instead run a sample per vertex and loop around that 5000 times instead - to avoid a TDR timeout…
If (a big if) I followed the batching mechanism of the optix_prime_baking sample instead could this avoid the TDR timeout, even if I went up that very worst case scenario of a total 500 million samples?
It would process and onload/offload 25 batches… again very worst case…? and bypass a timeout…

The reason I ask, is that if I sampled each vertex 5000 times, those results would be stored in sequence, and the accumulation of my visibility sample results would use for next loop with i++ to allow to add up all the samples in a row and then normalise.
if I ran all around each 100,000 vertices once and then looped that 5000 times the results stored would be offset (like a stride) and the loop to accumulate and normalise these would require a Step of vertex count (e.g. 5000), not just a simple i++…
Or would either loop matter at all?? (I hope you understand where I am going with this!!)

Again, many thanks in advance if you get any time to reply…

Thanks,
Alan

I’m not familiar with the OptiX Prime baking example.

I would recommend to not let you get deterred by examples which do one specific thing, which is ambient occlusion baking into textures in this case, if you’re planning to do something different.

Again, the simplest approach would be the following:

for each vertex (== 100,000+)
{
  initialize the ray origin to the vertex world coordinate once; 
}
for each ray direction (== 5000 or whatever count you like)
{
  for each vertex (== 100,000+)
  {
    set the ray direction to the normalized vector of your visibility test direction;
  }
  launch query (with number of vertices, == 100,000+);
  run a custom kernel which accumulates the hit/miss results into a separate final result per vertex buffer;
}
visualize your result with the accumulated final result per vertex buffer;

If you have that running, you can still look at additional mechanisms to make things faster.
Study the OptiX SDK examples with “prime” in the name first, before looking at advanced examples like the ambient occlusion texture baking.

Accumulating multiple hit results for a single vertex in one step would be possible as well, but you should keep the launch size at the number of vertices then and use a for-loop inside the kernel, so that you don’t need to us atomics.

Again, if you’re running on a WDDM device, you must stay under the 2 seconds kernel driver timeout.
You could split the above algorithm into even smaller steps (multiple batches with fewer vertices) when needed.
Just don’t shoot ridiculously small or large numbers of rays per launch. Keep it well above 64k to some millions to keep the GPU busy without hitting the timeout.

Hey Alan and Detlef, just noticed this thread. Keith and I wrote the Prime baking sample on github.

When creating batches of rays, you have a double loop over (1) sample points, and (2) a set of ray directions per sample point. You can do these loops in either order. We tried both orders for the baking sample and the performance was very similar, and scene dependent. We currently use the (1)-(2) order, so that all ray directions for a given sample point appear next to each other in a batch of rays; Detlef describes the (2)-(1) order in his code above. Both totally fine.

When developing the baking sample, we got it working for simpler cases (cow, teapot) first with no batching and no instancing. The git history should reflect this. As Detlef said, I recommend starting very simple for your application, with a small model where (num_ray_directions*num_sample_points) fits easily in memory.

Btw, the baking sample ultimately filters occlusion values onto vertices, not textures, although texture support could be added. Check the README for a rough outline of features:

-Dylan

Whoops, just checked the code and the version on github uses the (2)-(1) order. My mistake. It assigns one ray to each sample point per launch.

Hi Detlef and Dylan,

Thank you both again for your observations and guidance…

Apologies for not replying, but the forum system is not notifying me of updates to this thread (I am not sure why, so I didn’t see the last three comments).

I have been working like the proverbial Mad Professor (in my own head) coding over the past number of weeks…

I have been taking inspiration from both the Prime SDK (simple and instancing) samples and the Optix Prime Baking (OPB) sample.

I have learned huge amounts in this very specific area, based on all the code supplied from both sources, they have been of immense help.

I pretty much have primeSimple and primeInstancing down, and following from that, I have been able to break down the inner workings of OPB, even though my implementation is nowhere near as complex.

(Detlef) All of your suggestions are excellent…
(Dylan) Vertex baking is exactly the idea I was looking for,and have figured out that all the way down to the occlusion normalisation in UpdateOADevice and then passing of ‘occl’ (thrice!!) to the OpenGl interop.

I have obviously ran into problems, but have written the problems out through various means, which are probably not the ‘leanest’ of code, but they work for me…

One last question…
Is it possible to PM either of you… There is no direct method in the forum (where other forums do have this option)…
My request is nothing sinister, nor is it to continue my questions on this topic…

Much obliged again,
Alan

For private messages on the forum you just click on any user’s name to reach a site with a Send Private Message link.

Hi Detlef,

Thanks for that…
Yet again, I have fallen into a ‘Not Logged In’ trap!!
The Private Message option does not appear unless logged in (obviously!!)…

Thanks
Alan