Hello,
I was wonderig if I could change the code of the papper “Understanding the Efficiency of Ray Traversal on GPUs” to double precision.
Thank you
Rafael Scatena
Hello,
I was wonderig if I could change the code of the papper “Understanding the Efficiency of Ray Traversal on GPUs” to double precision.
Thank you
Rafael Scatena
Which part do you mean?
There are two papers with that name, the original from around 2009 and some supplemental with some AABB slab intersection from 2012.
The first one describes how to traverse a BVH with specific CUDA thread mechanisms, the second describes how to handle the AABB testing.
If you plan to use the first for anything, that would mean you write your own ray tracer in native CUDA.
That wouldn’t make any sense on RTX boards and even on boards without RT cores, OptiX provides the BVH traversal since forever.
If you only need the AABB slab intersection method in double, I don’t see a problem with that when the double precision impementation is IEEE conformant, that is handles the +inf and -inf cases the same as in float.
I’m using such an AABB intersection method in some internal example and the code looks like this:
(Needs this vector_math.h header.)
// Compute the near and far intersections of the AABB using the slab method.
// No intersection when tNear > tFar.
__forceinline__ __device__ float2 intersectAABB(float3 origin, float3 direction, float3 minAabb, float3 maxAabb)
{
const float3 tMin = (minAabb - origin) / direction;
const float3 tMax = (maxAabb - origin) / direction;
const float3 t0 = fminf(tMin, tMax);
const float3 t1 = fmaxf(tMin, tMax);
const float tNear = fmaxf(t0);
const float tFar = fminf(t1);
return make_float2(tNear, tFar);
}
Hello again,
It would be the first one. Do you mean that I should re-write all the kernels?
I am running on Quad Pro P5000 with 16384 MB.
But it uses float
You mean that I should re-write all the kernels?
Re-write what kernels?
The paper only contains pseudo code for the actual persistent threads algorithm.
It doesn’t describe how to build acceleration structures, how to traverse them most efficiently, or how to intersect with geometric primitives. That’s all happening inside that trace()
function in the end.
Yes, you would need to write all of that in CUDA. Good luck with that.
I am running on Quad Pro P5000 with 16384 MB.
There are four newer generations of GPUs than that .
A single Quadro P5000 has 8.9 TFlops single precision.
The current RTX 5000 Ada has 65.3 TFlops single precision and RT cores and Tensor cores on top.
Means a single one of those is already almost twice as fast on paper than your whole setup with four boards and then it can do ray tracing in hardware.
(EDIT: I misread Quad Pro P5000 as four of those but you probably meant a single Quadro P5000, so the performance advantage of a current Ada GPU will be even bigger.)
But it uses float
That’s why I explained in detail what you could try to use OptiX for everything else except the actual double precision intersection here:
https://forums.developer.nvidia.com/t/vtk-optix-triangle-mesh-write-read-operations/293966/6
You said you’re “building a medical physics application” and it “is x-ray simulation.”
But why would you require double precision throughout the whole ray tracing pipeline?
What exactly needs to be that precise that you can’t express it in floating point precision?
(I’m trying to understand if this is maybe an XY Problem )