Other OptiX 7 examples storing pointers to user defined data can be found in my examples:
That SystemData struct is my OptiX launch parameter block.
Inside that you’ll different methods to provide device memory buffers:
1.) I use CUdeviceptr which are just unsigned 64-bit values pointing to a CUDA device address for the output buffers.
The renderer can reinterpret that either as float4 or half4 data depending on a compile-time option with which I can switch the output buffer formats.
The intro_denoiser example can renderer in float4 or half4 formats. The other examples are using float4.
2.) There are some arrays of user defined structures like the definitions for the camera, light, and material parameters:
CameraDefinition* cameraDefinitions; // Currently only one camera in the array. (Allows camera motion blur in the future.)
3.) Then there are pointers to buffers of predefined types like these:
float* envCDF_U; // 2D, size (envWidth + 1) * envHeight
float* envCDF_V; // 1D, size (envHeight + 1)
which together with the fields
unsigned int envWidth; // The original size of the environment texture.
unsigned int envHeight;
dynamically define the size of the data used for importance sampling of the environment texture.
Please look through the C++ host code of the examples how the various buffers are allocated and copied from or to the device buffers.
E.g. here for the float data CDF buffers: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Texture.cpp#L1672
Means you can either use method 1 or 3 to store your pointer to a float3 buffer inside your launch parameters.
I recommend placing that pointer field at an 8-byte aligned offset inside your launch parameter structure to not have needless alignment padding added by the compiler.
Note that vectors with three components have no dedicated vectorized load and store instruction!
Means loading a float3 will load the data as three individual floats. That’s also why float3 have a memory alignment of 4-bytes.
Though loading float4 (16-byte aligned) or float2 (8-byte aligned) vectors will happen with a vectorized instruction which is usually faster due to better cache line usage. You can see that difference when looking into the PTX code of your OptiX device programs.
Means if you’re not memory constrained on the data, it would be recommended to use float4 types instead of float3 types for better access performance, even when you’re not actually using the w-component.
This is especially important when writing to pinned memory buffers from multiple GPU devices.
Many of the generic CUDA performance guidelines mentioned inside the older OptiX versions’ programming guide still apply: