OptixAccelRelocationInfo data


I just wanted more of an understanding of the data stored in OptixAccelRelocationInfo.

  1. What information about the acceleration struct is held inside the OptixAccelRelocationInfo data?
  2. Are they values or are they pointers to data inside the context?


The OptixAccelRelocationInfo is an opaque object as said inside the API documentation:

/// Used to store information related to relocation of acceleration structures.
/// \see #optixAccelGetRelocationInfo(), #optixAccelCheckRelocationCompatibility(), #optixAccelRelocate()
typedef struct OptixAccelRelocationInfo
    /// Opaque data, used internally, should not be modified
    unsigned long long info[4];
} OptixAccelRelocationInfo;

The data inside it cannot be used for anything else than checking if an acceleration structure built on one device is compatible with another device using optixAccelCheckRelocationCompatibility and to relocate an AS using optixAccelRelocate after copying it to a different device address (which must have the required alignment of OPTIX_ACCEL_BUFFER_BYTE_ALIGNMENT).

Described here:

Here is an example where I’m using it to check if geometry acceleration structures can be shared with CUDA peer-to-peer among devices, but that’s more for demonstration purposes since devices connected via NVLINK in that case are always matching the GPU architecture. It helped preventing a bug in a heterogenous GPU setup though.

Getting the relocation info:
Checking if it’s compatible:

What I’m trying to find out is would it be possible to share a geometry acceleration structure with 2 Optix contexts.
For example, say I had 2 GPU’s with the same architecture.
Both have their own Optix contexts.
Is it possible to copy the GAS from GPU1 to GPU2 and then call optixAccelRelocate using GPU2’s Optix context and GPU1’s OptixAccelRelocationInfo.
Would this be valid or OptixAccelRelocationInfo is specific to GPU1’s Optix context?

That is not really “sharing” since each device has its own copy afterwards, but yes, that is exactly one use case.

This check is also exactly what I’m doing in the two code lines I posted above, just that I don’t copy and relocate the GAS because I share it across the NVLINK bridge. In all other cases I simply build the GAS on each device.

So if you take an OptixAccelRelocationInfo from a GAS on device A and if optixAccelCheckRelocationCompatibility with that on device B passes, you can copy the data from A to B (pointer needs to be aligned to OPTIX_ACCEL_BUFFER_BYTE_ALIGNMENT), and because that resides at a different 64-bit address afterwards, you need to call optixAccelRelocate on device B with the OptixAccelRelocationInfo from A and the new CUdeviceptr to be able to use the copied GAS on device B with the new returned traversable handle.

Mind that this relocation always must be done if you copy the AS somewhere else than it had been built to initially, even on the same GPU.

Check the API reference on the optixAccelRelocate call which explains that:

In the end this all works because these are just 64-bit CUdeviceptr and the CUDA allocations are all distinct because of the Unified Virtual Address (UVA) space under 64-bit systems.

The question is if this is worth it. The GAS are built on the GPU and you could also build the AS in parallel on both devices and have the same result. Just saying.

That makes sense.
Would it be possible to then share a GAS on 2 different Optix contexts on the same GPU.
Say, passing a handle to the GAS from context A to context B via CUDAIPC. But then to create a OptixTraversableHandle on the context B would we need to call optixAccelRelocate or is there another way to create a OptixTraversableHandle to the GAS on context A?

I have no experience with CUDA inter-process communication. That’s only available under Linux.

Excerpt from the CUDA Programming Guide on IPC:

Using this API, an application can get the IPC handle for a given device memory pointer using cudaIpcGetMemHandle(), pass it to another process using standard IPC mechanisms (e.g., interprocess shared memory or files), and use cudaIpcOpenMemHandle() to retrieve a device pointer from the IPC handle that is a valid pointer within this other process. Event handles can be shared using similar entry points.

This would make me nervous about the CUdeviceptr in both processes, and rightfully so.
Reading the cudaIpcOpenMemHandle() manual it says:

No guarantees are made about the address returned in *devPtr. In particular, multiple processes may not receive the same address for the same handle.

If it’s not the same address it wouldn’t be usable without relocation and optixAccelRelocate() works in-place, which in turn would break the original data in the source process, so that would require a copy anyways.

What would be the real-world use case requiring this?

I actually got it working by sending an IPC handle of the GAS from one process to another. Then I just did a cast from the GAS buffer to a OptixTraversableHandle on the second application and that seemed to work and I was able to ray trace against it. So does that mean that’s all that’s needed to convert a GAS pointer into a OptixTraversableHandle?

As for our use case… we have an application that simulates sonar and we ray trace under a dynamic ocean which is a GAS that updates every frame.
As for the 2nd application, we want to visualise the dynamic ocean and the environment to see what is going on inside the sonar simulation.

I know this can all be done easily inside a single application but there are benefits for us to keep it all modular in different applications.

Be careful, since this means you’re using memory from one process that is owned by another process. This opens up some potentially tough questions, like ensuring that the process creating the memory always outlives the 2nd process using the memory. You will also be responsible to ensure that the 2nd process always has the security rights to access the memory. If you decided to start up some kind of server for the first process under a different user account, you might suddenly find everything breaks. Trying to share memory between processes could easily make it much less modular. (But I fully understand the desire to not duplicate memory unnecessarily!)

And take special note of what Detlef said about cudaIpcOpenMemHandle():“No guarantees” means what it says, even if it appears to work right now. You should not count on the pointer being the same, otherwise it will break at some unexpected time later.

Then I just did a cast from the GAS buffer to a OptixTraversableHandle on the second application and that seemed to work and I was able to ray trace against it. So does that mean that’s all that’s needed to convert a GAS pointer into a OptixTraversableHandle?

No, you need to be using optixConvertPointerToTraversableHandle(). In general, casting the GAS handle to a pointer or vice-versa will crash, they are not the same value. BTW, we do not guarantee anything about the OptixTraversableHandle’s relationship to the pointer. While there may be some correspondence in the implementation right now, that may not be true on different GPUs or in the future, so don’t assume a working shortcut will continue working.


I looked at optixConvertPointerToTraversableHandle() but I didn’t know what OptixTraversableType to pass in for a GAS. Would passing in OPTIX_TRAVERSABLE_TYPE_STATIC_TRANSFORM work for a GAS?

I’m just wanting to explore if all this is possible as it would be nice not to duplicate memory. Otherwise it’s all probably easier doing this in one process.

Oh, you’re right, you can’t use optixConvertPointerToTraversableHandle() to convert a GAS buffer into a handle. Sorry about that. Casting a pointer definitely work work though, so I think there is no sanctioned way to share GAS buffer pointers across processes. We can discuss it internally as a feature request, if it’s critical and there are no other options. It is definitely complicated though, not as simple as sharing a pointer, so I wouldn’t recommend waiting for it.

One modular, multi-process architecture to consider would be a client-server type of system, where your server handles generic rendering requests in a multi-threaded, multi-stream way. This way you can have a single process “own” all the BVHs, and share memory whenever possible, but still serve different applications at the same time.