Passing an array to raygeneration program via launch parameters

Hello all,

I know this has probably been answered before, so please forgive if this is dumb.

I would like to pass an array of float3 values, where each element in the array represents a single x, y, and z coordinate, to the my raygeneration program via the launch parameters structure across the DEVICE. I am sure that can be done but is there some simple code that illustrates this operation?

Thank you in advance for any help.

Did you run into any snags when you tried using an array? It should be just as easy as a single float3 if the array is constant sized. If the array is large, and you want the launch param to be a pointer, then you’ll need to allocate a CUDA buffer beforehand and then copy the pointer to the buffer into launch params, setting the appropriate type, etc. Recall that launch params are in constant memory and limited to 64KB, and OptiX uses some of that, so if your array is larger than 32KB, putting it elsewhere in memory is recommended.

If you’re asking about how to pass launch params in general, take a look at the optixPathTracer sample. In optixPathTracer.h is a struct called “Params”. This struct is the launch params, and you’re free to put an array in there. If you follow the references to Params, you’ll see how it’s initialized and copied to the device in optixPathTracer.cpp, and how it’s used in raygen in optixPathTracer.cu.

–
David.

Other OptiX 7 examples storing pointers to user defined data can be found in my examples:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/system_data.h#L42
That SystemData struct is my OptiX launch parameter block.

Inside that you’ll different methods to provide device memory buffers:

1.) I use CUdeviceptr which are just unsigned 64-bit values pointing to a CUDA device address for the output buffers.
The renderer can reinterpret that either as float4 or half4 data depending on a compile-time option with which I can switch the output buffer formats.
The intro_denoiser example can renderer in float4 or half4 formats. The other examples are using float4.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_denoiser/shaders/raygeneration.cu#L233

2.) There are some arrays of user defined structures like the definitions for the camera, light, and material parameters:

  CameraDefinition*   cameraDefinitions; // Currently only one camera in the array. (Allows camera motion blur in the future.)
  LightDefinition*    lightDefinitions;
  MaterialDefinition* materialDefinitions;

3.) Then there are pointers to buffers of predefined types like these:

  float* envCDF_U;  // 2D, size (envWidth  + 1) * envHeight
  float* envCDF_V;  // 1D, size (envHeight + 1)

which together with the fields

  unsigned int envWidth; // The original size of the environment texture.
  unsigned int envHeight;

dynamically define the size of the data used for importance sampling of the environment texture.

Please look through the C++ host code of the examples how the various buffers are allocated and copied from or to the device buffers.
E.g. here for the float data CDF buffers: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Texture.cpp#L1672

Means you can either use method 1 or 3 to store your pointer to a float3 buffer inside your launch parameters.
I recommend placing that pointer field at an 8-byte aligned offset inside your launch parameter structure to not have needless alignment padding added by the compiler.

Note that vectors with three components have no dedicated vectorized load and store instruction!
Means loading a float3 will load the data as three individual floats. That’s also why float3 have a memory alignment of 4-bytes.
Though loading float4 (16-byte aligned) or float2 (8-byte aligned) vectors will happen with a vectorized instruction which is usually faster due to better cache line usage. You can see that difference when looking into the PTX code of your OptiX device programs.
Means if you’re not memory constrained on the data, it would be recommended to use float4 types instead of float3 types for better access performance, even when you’re not actually using the w-component.
This is especially important when writing to pinned memory buffers from multiple GPU devices.
Many of the generic CUDA performance guidelines mentioned inside the older OptiX versions’ programming guide still apply:
https://raytracing-docs.nvidia.com/optix6/guide_6_5/index.html#performance#performance-guidelines

Thank you @droettger and @dhart for the great information.

However, I meant to ask about passing an array float3 to Shader Binding Table (SBT) such that the closest hit program can access the x, y, z coordinates. Sorry about that.

Then your Shader Binding Table record needs to contain additional data behind the 32-byte header and that would need to be the device pointer to your float3 data buffer.
The same declaration methods as I described above apply. It’s just in a different struct, your SBT record.

That mechanism is described here: https://raytracing-docs.nvidia.com/optix7/guide/index.html#shader_binding_table#records
though just with a single float3 element for a color instead of a device pointer to an array of float3 data.

You access that SBT record data with the OptiX device function optixGetSbtDataPointer()
https://raytracing-docs.nvidia.com/optix7/guide/index.html#shader_binding_table#sbt-record-access-on-device

If you search the OptiX SDK example source code for that function, you’ll find examples which are storing different data there.

In my examples I use an SBT hit record per instance and the SBT hit record data stores a pointer to a structure which contains device pointers to the interleaved vertex attribute data, the triangle indices buffer of the referenced GAS and two integers for the material and light IDs.

Data structure stored in my SBT hit records: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/system_data.h#L95
SBT record definition: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/inc/Device.h#L197
SBT record pointer usage inside the device program: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/closesthit.cu#L128

You would need a simpler variant of that using your device pointer directly (depending on what your scene structure and SBT layout is).

Also read this older thread again: https://forums.developer.nvidia.com/t/sbt-theoretical-quesions/179309

1 Like