Depends on what you’re trying to do and on which GPU.
If you’re planning to use half precision calculations in your kernel, you would need to be careful on which GPUs you’re actually working to gain a performance benefit. I would not recommend to use half float calculations inside the OptiX kernels if you target all OptiX suported GPUs.
Using half precision input/output buffers is straightforward and only needs the two CUDA intrinsic functions __half2float() and __float2half_rn() to convert to and from 32-bit float values. These exist as overloads for unsigned short and __half formats which are both 16-bit in size.
Here is an example of using the unsigned short methods which are implemented inside the CUDA device_functions.hpp which should be present in OptiX. No need to include cuda_fp16.h then.
rtBuffer<ushort4, 2> rgba16f; // e.g. RGBA16F input-output buffer.
// Load half values contained inside the ushort bits.
const ushort4 us4 = rgba16f[theLaunchIndex];
// Convert the half values to float data
float r = __half2float(us4.x);
float g = __half2float(us4.y);
float b = __half2float(us4.z);
float a = __half2float(us4.w);
// ... Do something with the 32-bit floating point rgba data.
// Convert float values into half values with rounding and store them as RGBA16F into the ushort bits.
rgba16f[theLaunchIndex] = make_ushort4(__float2half_rn(r), __float2half_rn(g), __float2half_rn(b), __float2half_rn(a));
Using RGBA16F data is going to reduce the bandwidth required to transfer such image data.
If you want to upload such buffer data to an OpenGL texture, that happens the same way as with RGBA32F data.
Here’s some example code from my ray tracers which can switch the buffer format between RGBA32F and RGBA16F at compile time with the USE_FLOAT_32_BIT_OUTPUT define with OpenGL interoperability using pixel-bufferobjects or without:
if (m_interop)
{
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_bufferOutput->getGLBOId());
#if USE_FLOAT_32_BIT_OUTPUT
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_FLOAT, (void*) 0); // RGBA32F data from PBO byte offset 0.
#else
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_HALF_FLOAT, (void*) 0); // RGBA16F data from PBO byte offset 0.
#endif
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
}
else
{
const void* data = m_bufferOutput->map();
#if USE_FLOAT_32_BIT_OUTPUT
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_FLOAT, data); // RGBA32F
#else
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_HALF_FLOAT, data); // RGBA16F
#endif
m_bufferOutput->unmap();
}