Dynamic Parallelism in OptiX?


are there any plans on supporting Dynamic Parallelism in OptiX?

It would be nice to launch multiple rays in parallel from e.g. a hit program.

I would also appreciate such a feature !
I guess that would be possible only on Kepler architectures.

Do you have examples of use cases for using such a feature? What benefits do expect to get from it?

I’m working on ultrasonic wave propagation simulation. In the anisotropic case, each ray intersection with a surface generates up to 6 news rays toward six different directions.
Being able to throw these rays in parallel would prevent me from doing it sequentially.

Another case in which dynamic parallelism would really help is when I throw rays uniformly on a sphere to find out which directions reach a given surface. I need to be able to refine the process around these ray-tracing directions to get high precision in those areas and not waste computation time on the others.
Since the new kernels I need to launch depend on large amounts of data that are on GPU memory, I think it would be really helpful to launch these kernels directly from GPU without having to deal with data on CPU.

I have similar use cases. One is the wave propagation of electromagnectic waves at diffraction edges. If an incoming ray intersects such an edge, this will generate e.g. 120 new rays, each of which has a different direction. Currently, I have to trace them sequentially.

So is there a performance concern with tracing rays sequentially? Or is there some other problem?

If it is better performance that you are looking for, keep in mind that dynamic parallelism has overhead. These use cases seem to have a relatively small number of rays. There probably isn’t enough work there to amortize the overhead.

Are the spawned rays coherent? For coherent rays there could be some potential benefit in having a construct in OptiX to spawn multiple rays. Such a construct would provide more information to the ray scheduler. This could be implemented efficiently without dynamic parallelism.

Yes, it is a performance concern. What I “guess” (since the ray scheduling of OptiX is blackbox to me) happens is:

  1. A packet/warp/… of coherent rays is traced.
  2. Only one of these rays hits a “spawning object”, which triggers the creation of multiple rays.
  3. These rays are created and traced in this single ray’s thread, stalling the rest of the rays
  4. This might even get worse, if one of the newly created rays again hits another “spawning object” …

These newly created rays are coherent to some extent. So handing them over to OptiX alltogether could help a lot.