optix7 denoiser

Hi,

I have a question about the denoiser in optix 7…

I implemented it based on the example from Ingo Wald. but honestly the example does not work.
it uses the “overlapWindowSizeInPixels” from “OptixDenoiserSizes”. it seems like this value is not set correctly. at least not in driver GR 441.08. it gets back a value of 3452816845, which seems to be uninitialized. if I set the structures to 0 before calling optixDenoiserComputeMemoryResources it stays 0.

in the example I notice that the image to be rendered is increased by that value. is this like it should be ? I need to render a bigger resolution for the denoise pass ? (well, if overlapWindowSizeInPixels would be set)

another question … setting the overlapWindowSizeInPixels to 0 works somehow, but just if the image has not too much noise. then it works fine. but for images with still a lot of noise, the denoiser returns with success, but the synchronizing hangs and never returns. is there something else missing?

Tiled denoising is not implemented, and some more things aren’t either. Please read the OptiX release notes and search this forum.

You don’t need to render bigger when you want to denoise an image.
This would only come into play when trying to denoise images partially, e.g. on multiple GPUs.

3452816845 is hex 0xCDCDCDCD which is the debug initialization by MSVS.
That would mean you didn’t initialize the OptixDenoiserSizes structure with zero and the driver didn’t write the field.
Are you saying this worked with a previous drivers? (e.g. like 436.30).

The denoiser should behave the same independently of the amount of noise inside the input buffers.
OptiX launches are asynchronous, including the denoiser invocation and care needs to be taken to synchronize around those as necessary when changing input values or receiving output data.

What’s your system configuration?
OS version, installed GPU(s), display driver versions (esp. the working version as well if this is a regression), OptiX SDK version (major.minor.micro), CUDA Toolkit version used to compile your input PTX code, host compiler version.

The denoiser is not going to be very effective with very few samples in general.
Let’s say 4 samples or fewer will just not have enough data to predict a good result and everything will look smudgy.
High frequency color detail from textures will get much better when adding an albedo buffer.
Lots of fine geometric detail would get better with a normal buffer, but that is not implemented again, yet, since OptiX 5.1.0.

The setup for an HDR denoiser using RGB and no albedo looks like this:

// This code is using the CUDA Driver API.
  // Let's say there is an Application class storing global variables.

  // Denoiser:
  OptixDenoiser       m_denoiser;
  OptixDenoiserSizes  m_sizesDenoiser;
  OptixDenoiserParams m_paramsDenoiser;
  CUdeviceptr         m_d_stateDenoiser;
  CUdeviceptr         m_d_scratchDenoiser;
  CUdeviceptr         m_d_denoisedBuffer;

  // Application::Application()
  m_denoiser = nullptr;
  m_d_stateDenoiser = 0;
  m_d_scratchDenoiser = 0;
  m_d_denoisedBuffer = 0;

  // Application::~Application()
  if (!m_interop)
  {
    CU_CHECK( cuMemFree(m_d_denoisedBuffer) ); // This is only allocated when there is no OpenGL interop.
  }
  CU_CHECK( cuMemFree(m_paramsDenoiser.hdrIntensity) ); 
  CU_CHECK( cuMemFree(m_d_scratchDenoiser) );
  CU_CHECK( cuMemFree(m_d_stateDenoiser) );

  OPTIX_CHECK( m_api.optixDenoiserDestroy(m_denoiser) ); // m_api is the OptixFunctionTable.

// Application::reshape()
...
  // Update the denoiser setup.
  if (!m_interop)
  {
    CU_CHECK( cuMemFree(m_d_denoisedBuffer) );
#if USE_FP32_OUTPUT
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(float4) * m_width * m_height) );
#else
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(Half4) * m_width * m_height) );
#endif
  }

  memset(&m_sizesDenoiser, 0, sizeof(OptixDenoiserSizes));

  OPTIX_CHECK( m_api.optixDenoiserComputeMemoryResources(m_denoiser, m_width, m_height, &m_sizesDenoiser) );

  CU_CHECK( cuMemFree(m_d_stateDenoiser) );
  CU_CHECK( cuMemAlloc(&m_d_stateDenoiser, m_sizesDenoiser.stateSizeInBytes) );

  CU_CHECK( cuMemFree(m_d_scratchDenoiser) );
  CU_CHECK( cuMemAlloc(&m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

  OPTIX_CHECK( m_api.optixDenoiserSetup(m_denoiser, m_cudaStream, 
                                        m_width, m_height, 
                                        m_d_stateDenoiser,   m_sizesDenoiser.stateSizeInBytes,
                                        m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );
...

// One-time denoiser initialization:
void Application::initDenoiser()
{
  OptixDenoiserOptions optionsDenoiser;

  optionsDenoiser.inputKind   = OPTIX_DENOISER_INPUT_RGB;  // or OPTIX_DENOISER_INPUT_RGB_ALBEDO
#if USE_FP32_OUTPUT
  optionsDenoiser.pixelFormat = OPTIX_PIXEL_FORMAT_FLOAT4;
#else
  optionsDenoiser.pixelFormat = OPTIX_PIXEL_FORMAT_HALF4;
#endif

  OPTIX_CHECK( m_api.optixDenoiserCreate(m_context, &optionsDenoiser, &m_denoiser) );
  
  OPTIX_CHECK( m_api.optixDenoiserSetModel(m_denoiser, OPTIX_DENOISER_MODEL_KIND_HDR, nullptr, 0) ); // Need to set the model to be able to calculate the memory requirements.
  
  memset(&m_sizesDenoiser, 0, sizeof(OptixDenoiserSizes));

  OPTIX_CHECK( m_api.optixDenoiserComputeMemoryResources(m_denoiser, m_width, m_height, &m_sizesDenoiser) );

  MY_ASSERT(m_d_stateDenoiser == 0);
  CU_CHECK( cuMemAlloc(&m_d_stateDenoiser, m_sizesDenoiser.stateSizeInBytes) );
  
  MY_ASSERT(m_d_scratchDenoiser == 0);
  CU_CHECK( cuMemAlloc(&m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

  OPTIX_CHECK( m_api.optixDenoiserSetup(m_denoiser, m_cudaStream, 
                                        m_width, m_height, 
                                        m_d_stateDenoiser,   m_sizesDenoiser.stateSizeInBytes,
                                        m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

  m_paramsDenoiser.denoiseAlpha = 0;    // Don't denoise alpha, just copy it.
  m_paramsDenoiser.blendFactor  = 0.0f; // Show the denoised image only.
  CU_CHECK( cuMemAlloc(&m_paramsDenoiser.hdrIntensity, sizeof(float)) ); // Allocate a float on the device for the optixDenoiserComputeIntensity() result.

  if (!m_interop) // Only need to allocate the target buffer if it's not an OpenGL interop buffer (PBO or texture image). Code paths for that omitted below for brevity.
  {
#if USE_FP32_OUTPUT
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(float4) * m_width * m_height) );
#else
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(Half4) * m_width * m_height) ); // Own struct Half4 because CUDA only implements half and half2 types.
#endif
  }
}

Application::render()
...
    // Update only the sysParameter.iterationIndex.
    m_systemParameter.iterationIndex = m_iterationIndex++;

    CU_CHECK( cuMemcpyHtoD(reinterpret_cast<CUdeviceptr>(&m_d_systemParameter->iterationIndex), &m_systemParameter.iterationIndex, sizeof(int)) );

    OPTIX_CHECK( m_api.optixLaunch(m_pipeline, m_cudaStream, (CUdeviceptr) m_d_systemParameter, sizeof(SystemParameter), &m_sbt, m_width, m_height, /* depth */ 1) );

    OptixImage2D inputLayer;

    inputLayer.data               = m_systemParameter.outputBuffer; // The noisy RGBA32F or RGBA16F image.
    inputLayer.width              = m_width;
    inputLayer.height             = m_height;
#if USE_FP32_OUTPUT
    inputLayer.rowStrideInBytes   = m_width * sizeof(float4);
    inputLayer.pixelStrideInBytes = sizeof(float4);
    inputLayer.format             = OPTIX_PIXEL_FORMAT_FLOAT4;
#else
    inputLayer.rowStrideInBytes   = m_width * sizeof(Half4);
    inputLayer.pixelStrideInBytes = sizeof(Half4);
    inputLayer.format             = OPTIX_PIXEL_FORMAT_HALF4;
#endif

    // Calculate the intensity on the outpuBuffer data.
    OPTIX_CHECK( m_api.optixDenoiserComputeIntensity( m_denoiser, m_cudaStream, &inputLayer, m_paramsDenoiser.hdrIntensity, m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

    OptixImage2D outputLayer;
    outputLayer.data               = m_d_denoisedBuffer; // The denoised RGBA32F or RGBA16F image, where A is copied from the input.
    outputLayer.width              = m_width;
    outputLayer.height             = m_height;
#if USE_FP32_OUTPUT
    outputLayer.rowStrideInBytes   = m_width * sizeof(float4);
    outputLayer.pixelStrideInBytes = sizeof(float4);
    outputLayer.format             = OPTIX_PIXEL_FORMAT_FLOAT4;
#else
    outputLayer.rowStrideInBytes   = m_width * sizeof(Half4);
    outputLayer.pixelStrideInBytes = sizeof(Half4);
    outputLayer.format             = OPTIX_PIXEL_FORMAT_HALF4;
#endif

      OPTIX_CHECK( m_api.optixDenoiserInvoke(m_denoiser, m_cudaStream, &m_paramsDenoiser,
                                             m_d_stateDenoiser, m_sizesDenoiser.stateSizeInBytes,
                                             &inputLayer, 1, 0, 0, &outputLayer,
                                             m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

...

thank you for your answer… I will compare step by step if there is anything I forgot …

about OptixDenoiserSizes … it wasn’t clear for what this overlapWindowSizeInPixels value is and I saw its usage in the “example11_denoiseColorOnly” example code. When this structure needs to be initialized before then this example is wrong, too. If I add the initialization it works. but without it fails… but thank you for that information!

I found the hanging problem, too. it was related to the hdrIntensity. the value wasn’t initialized correctly. now it’s working fine. thank you for your example code !!

I filed a bug report about the missing overlapWindowSizeInPixels return value.
The 441.08 driver code is actually not writing that member.
The correct code is present in main branches so this should be working in future drivers.
Normally the returned value would be 64.