Cuda Timeout when using KdTree Accelerators

Hi to all of you,

I use OptiX for an ordinary raytracer. Until now i could fix every problem by myself or using google.
I load a scenegraph and build an OptiX-Context accordingly:
At the root I use a RTGroup and an attached BVH-Accelerator.
Below this root is an arbitrary Tree of RTGroups (nodes), each with an attached BVH-Accelerator.
At the leafs I use RTGeometrygroups, again each with an attached BVH-Accelerator and a RTGeometry.
This system works, i get the desired frames.

Since all the geometry consists of triangles I want to use the accelerators for triangles (TriangleKdTree/KdTree) at the lowest level (RTGeometrygroup). Therefore this is the only thing i changed from the working system.
I set the vertex and index buffers as follows:

rtAccelerationCreate(m_context, &m_o_accel);
rtAccelerationSetBuilder(m_o_accel, "TriangleKdTree");
rtAccelerationSetTraverser(m_o_accel, "KdTree");
rtAccelerationSetProperty(m_o_accel, "vertex_buffer_name", "vertexBuffer");
rtAccelerationSetProperty(m_o_accel, "vertex_buffer_stride", "0");
rtAccelerationSetProperty(m_o_accel, "index_buffer_name", "vertexIndexBuffer");
rtAccelerationSetProperty(m_o_accel, "index_buffer_stride", "0");

But I get the following error:

Unknown error (Details: Function "RTresult _rtContextLaunch2D(RTcontext_api*,
 unsigned int, RTsize, RTsize)" caught exception: Encountered a CUDA error: 
Kernel launch returned (702): Launch timeout, [6619200])

If I am not mistaken, this means that the call to rtContextLaunch2D takes too long.
Therefore I tried to minimize the work:

  • 6x6 Pixel
  • no recursion but to the lightsources --> no shadows, transparencies, refractions, reflections
  • 589 vertices in 5 buffers
  • 3348 vertex indices in 5 buffers
  • before first real launch a call to rtContextLaunch2D(context, 0, 0) --> compiles in 5.5-6.5s

The system is as follows:

  • SUSE Linux Enterprise Desktop 11 (x86_64)
  • Quadro 6000
  • Nvidia driver 304.54
  • Cuda 5.0
  • OptiX 3.0.1

I tried manipulating the buffers length and names, getting validation error, to assure that they are correct. Those are the same buffers I use in the intersection test:

RT_PROGRAM void mesh_intersect(int primIdx){
   int v_id0 = vertexIndexBuffer[primIdx * 3 + 0];
   int v_id1 = vertexIndexBuffer[primIdx * 3 + 1];
   int v_id2 = vertexIndexBuffer[primIdx * 3 + 2];

   float3 p0 = vertexBuffer[v_id0];
   float3 p1 = vertexBuffer[v_id1];
   float3 p2 = vertexBuffer[v_id2];

   //intersections tests follow

Is it even possible to mix the different acceleration structures for the different tiers of the tree?

I hope someone can point me in the right direction to fix this problem!

If you need further information I will gladly provide it (hopefully not a smallest working example, which would be rather hard to make because the application uses MPI to work on a distributed system)


If you’re experiencing a timeout, the first and easiest thing to do is implement a callback function. This will return control to the CPU every so often to avoid the CPU thinking that the GPU is unresponsive. See the documentation for rtContextSetTimeoutCallback.

Thank you for the reply:)

I will try that, but i do not think it will solve the problem.
Since i reduced the workload to almost nothing (if you compare it to what OptiX can compute), and it still needs so long, the problems appears to be somewhere else.

How about upgrading to the latest versions of OptiX (3.6) and CUDA (6.0)?
I prefer Sbvh over KdTree, you might try that as well.

I implemented the callback function. It gets called while compiling (which was expected), but not during the normal context launch. The error remains.

Sbvh works fine. If i understand the programming guide correctly, this accelerated can use the triangle buffers as well (which is still set). So i am confused that it works with Sbvh but not the KdTree.

Updating is not an option right now, the GPU-Cluster i work on is not administrated by myself.