How to use RT_FORMAT_HALF*

Hi,

What should be the format for rtBuffer<> if the buffer is created with RT_FORMAT_HALF4? Do I need cuda_fp16.h to use it? You know, for RT_FORMAT_FLOAT4, rtBuffer is supposed to be used, accessing and operating with float4 is straightforward.

As there isn’t any optix examples nor documentations about RT_FORMAT_HALF*, I would like to get help from here.

Thank you very much,

Yashiz

Depends on what you’re trying to do and on which GPU.

If you’re planning to use half precision calculations in your kernel, you would need to be careful on which GPUs you’re actually working to gain a performance benefit. I would not recommend to use half float calculations inside the OptiX kernels if you target all OptiX suported GPUs.

Using half precision input/output buffers is straightforward and only needs the two CUDA intrinsic functions __half2float() and __float2half_rn() to convert to and from 32-bit float values. These exist as overloads for unsigned short and __half formats which are both 16-bit in size.

Here is an example of using the unsigned short methods which are implemented inside the CUDA device_functions.hpp which should be present in OptiX. No need to include cuda_fp16.h then.

rtBuffer<ushort4, 2> rgba16f; // e.g. RGBA16F input-output buffer.

// Load half values contained inside the ushort bits.
const ushort4 us4 = rgba16f[theLaunchIndex];

// Convert the half values to float data 
float r = __half2float(us4.x);
float g = __half2float(us4.y);
float b = __half2float(us4.z);
float a = __half2float(us4.w);

// ... Do something with the 32-bit floating point rgba data.

// Convert float values into half values with rounding and store them as RGBA16F into the ushort bits. 
rgba16f[theLaunchIndex] = make_ushort4(__float2half_rn(r), __float2half_rn(g), __float2half_rn(b), __float2half_rn(a));

Using RGBA16F data is going to reduce the bandwidth required to transfer such image data.
If you want to upload such buffer data to an OpenGL texture, that happens the same way as with RGBA32F data.

Here’s some example code from my ray tracers which can switch the buffer format between RGBA32F and RGBA16F at compile time with the USE_FLOAT_32_BIT_OUTPUT define with OpenGL interoperability using pixel-bufferobjects or without:

if (m_interop) 
{
  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_bufferOutput->getGLBOId());
#if USE_FLOAT_32_BIT_OUTPUT
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_FLOAT, (void*) 0); // RGBA32F data from PBO byte offset 0.
#else
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_HALF_FLOAT, (void*) 0); // RGBA16F data from PBO byte offset 0.
#endif
  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
}
else
{
  const void* data = m_bufferOutput->map();
#if USE_FLOAT_32_BIT_OUTPUT
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_FLOAT, data); // RGBA32F
#else
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_HALF_FLOAT, data); // RGBA16F
#endif
  m_bufferOutput->unmap();
}

Thank you Detlef. Very helpful and clear.

Cheers :)