Hey, okay, so a few things to answer here.
Regarding a couple of comments you made about “Since TraceDepth is 1, tracing terminates.” we might want to clarify what “trace depth” means. That’s referring to how many times you can call optixTrace()
recursively, and it does not affect traversal of a single ray. If you request a closest hit, then your rays will terminate on the closest hit, so the behavior you described might be correct, as long as you have disabled your anyhit shaders. If you enable anyhit shaders, the ray will report all intersections along the ray unless you explicitly terminate the ray in your anyhit program, during traversal, otherwise the ray will continue for multiple intersections regardless of the TraceDepth setting.
it seems that using only one ray per pair of triangles won’t be sufficient.
Right, yeah I was trying to hint at this earlier. We need to better define what the question really is. Scenario #4 is showing a situation where triangle (d) is partially occluding the visibility of triangle (e) from triangle (a). It happens to completely block the ray between (a) and (e)'s centroids, which illustrates why you can’t rely on a single ray between the centroids to give you the correct answer.
If you want to know whether there is any visibility between triangles (a) and (e), then you will want to use multiple rays. Tracing a ray between the centroids gives you a very rough and biased approximation of the visibility between the triangles. This approximation might be sufficient if you have many small triangles, but otherwise if you need a better approximation, you can reduce the bias of your visibility query by randomizing the rays, and you can improve the accuracy of your approximation by sending more rays. For example you could send a whole batch of rays, each starting from a uniform random location in triangle (a), and aimed at another uniform random location on triangle (e). By sampling the volume between the two triangles with multiple rays, you will be much more likely to find some unoccluded rays, and therefore know that these triangles have partial visibility. Similarly, you will also know with greater certainty that triangle (d) is partially occluding the space between (a) and (e).
I might need to execute optixLaunch multiple times, as you suggested.
Can I use a loop statement to run optixLaunch multiple times?
Yes, that’s easy and straightforward. The SDK sample called optixPathTracer
does “progressive” rendering where it uses each subsequent launch to improve the image by blending new results with old results. This is a good simple example of looping the launch.
Isn’t the optixGetTriangleBarycentrics() value returning the center of gravity of the triangle hit by the ray?
No, the result of the call to optixGetTriangleBarycentrics()
is giving you the coordinates of the hit point of your ray, using barycentric coordinates. These coordinates tell you how to find your hit point using a weighted sum of the vertices of the triangle. The barycentric coordinates are the weights. So they will always vary across the face of triangle that has non-zero area. You will find that a hit point that is near to one of the three vertices will return barycentric coordinates with one coordinate close to 1.0, and the other two close to 0.0. The centroid of your triangle is defined by the barycentric coordinates (1/3, 1/3, 1/3). Barycentric coordinates always sum to 1.0, and you only need two of them to reconstruct a point.
It seems that I’ll need the barycentroids and positions of each triangle to set the ray origin and direction in the raygen function. Can I pass this information as “launch parameters”? If so, I’m considering passing an array of triangle centroids and indices.
Launch parameters are rather limited in size. Those live in constant
memory, and the launch params buffer has a maximum size of 64 kilobytes, I believe. And OptiX might use a little bit of that IIRC. So for a large buffer of information needed for raygen to run, you should use a normal global
memory buffer, i.e., something you allocate with cudaMalloc
or the like.
Also if you want to use multiple rays to sample the visibility between each pair of triangles, I would recommend generating your random samples in the raygen program, rather than storing your sampling information in a buffer. It will be much faster to generate samples on the GPU during raygen.
You could do something like this, perhaps: (just an example based on what I think I understand, customize to suit, or ignore this if I’m misunderstanding what you need.)
- Store your triangles (vertices & indices) in global memory buffers.
- Pass the pointers to these buffers to OptiX via launch params.
- In raygen, have each thread be responsible to process 1 pair of triangles
- In raygen, loop over N rays
– For each ray, generate 2 pairs of uniform unit random numbers, so 4 floats each in [0,1] (examples of GPU random number generation are in the OptiX SDK)
– Use one pair as the barycentric coordinates of your ray origin inside the “start” triangle (remember you only need any 2 out of the 3 bary coords)
– Use the other pair for the bary of your ray destination inside the “destination” triangle
– Compute the start point in object/world space using the barycentric formula (note you need to condition your random numbers when they sum to > 1. To make a random sample in the unit square barycentric, fold it across the diagonal. i.e. (psuedocode) u,v = rand(), rand(); if (u+v > 1) then u,v = 1-v, 1-u;
– Compute the destination point similarly
– Subtract the destination point from the start point to compute a ray direction (no need to normalize unless you want to).
– Call optixTrace()
and test whether the ray hits the destination triangle
- When your loop is complete, record the result into your visibility matrix. You could record M if all rays were blocked, and H if any of the rays made it to the destination triangle. (Or you could record the count of ray hits on the destination triangle, or a floating point value of the percentage of hits, etc… there are multiple options. Using 1 bit for H & M could make it very compact and memory efficient, but you might find ways to use the hit count or percent.)
–
David.