optix7 denoiser

m.breitzke · October 29, 2019, 9:32pm

Hi,

I have a question about the denoiser in optix 7…

I implemented it based on the example from Ingo Wald. but honestly the example does not work.
it uses the “overlapWindowSizeInPixels” from “OptixDenoiserSizes”. it seems like this value is not set correctly. at least not in driver GR 441.08. it gets back a value of 3452816845, which seems to be uninitialized. if I set the structures to 0 before calling optixDenoiserComputeMemoryResources it stays 0.

in the example I notice that the image to be rendered is increased by that value. is this like it should be ? I need to render a bigger resolution for the denoise pass ? (well, if overlapWindowSizeInPixels would be set)

another question … setting the overlapWindowSizeInPixels to 0 works somehow, but just if the image has not too much noise. then it works fine. but for images with still a lot of noise, the denoiser returns with success, but the synchronizing hangs and never returns. is there something else missing?

droettger · October 30, 2019, 11:52am

Tiled denoising is not implemented, and some more things aren’t either. Please read the OptiX release notes and search this forum.

You don’t need to render bigger when you want to denoise an image.
This would only come into play when trying to denoise images partially, e.g. on multiple GPUs.

3452816845 is hex 0xCDCDCDCD which is the debug initialization by MSVS.
That would mean you didn’t initialize the OptixDenoiserSizes structure with zero and the driver didn’t write the field.
Are you saying this worked with a previous drivers? (e.g. like 436.30).

The denoiser should behave the same independently of the amount of noise inside the input buffers.
OptiX launches are asynchronous, including the denoiser invocation and care needs to be taken to synchronize around those as necessary when changing input values or receiving output data.

What’s your system configuration?
OS version, installed GPU(s), display driver versions (esp. the working version as well if this is a regression), OptiX SDK version (major.minor.micro), CUDA Toolkit version used to compile your input PTX code, host compiler version.

The denoiser is not going to be very effective with very few samples in general.
Let’s say 4 samples or fewer will just not have enough data to predict a good result and everything will look smudgy.
High frequency color detail from textures will get much better when adding an albedo buffer.
Lots of fine geometric detail would get better with a normal buffer, but that is not implemented again, yet, since OptiX 5.1.0.

The setup for an HDR denoiser using RGB and no albedo looks like this:

// This code is using the CUDA Driver API.
  // Let's say there is an Application class storing global variables.

  // Denoiser:
  OptixDenoiser       m_denoiser;
  OptixDenoiserSizes  m_sizesDenoiser;
  OptixDenoiserParams m_paramsDenoiser;
  CUdeviceptr         m_d_stateDenoiser;
  CUdeviceptr         m_d_scratchDenoiser;
  CUdeviceptr         m_d_denoisedBuffer;

  // Application::Application()
  m_denoiser = nullptr;
  m_d_stateDenoiser = 0;
  m_d_scratchDenoiser = 0;
  m_d_denoisedBuffer = 0;

  // Application::~Application()
  if (!m_interop)
  {
    CU_CHECK( cuMemFree(m_d_denoisedBuffer) ); // This is only allocated when there is no OpenGL interop.
  }
  CU_CHECK( cuMemFree(m_paramsDenoiser.hdrIntensity) ); 
  CU_CHECK( cuMemFree(m_d_scratchDenoiser) );
  CU_CHECK( cuMemFree(m_d_stateDenoiser) );

  OPTIX_CHECK( m_api.optixDenoiserDestroy(m_denoiser) ); // m_api is the OptixFunctionTable.

// Application::reshape()
...
  // Update the denoiser setup.
  if (!m_interop)
  {
    CU_CHECK( cuMemFree(m_d_denoisedBuffer) );
#if USE_FP32_OUTPUT
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(float4) * m_width * m_height) );
#else
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(Half4) * m_width * m_height) );
#endif
  }

  memset(&m_sizesDenoiser, 0, sizeof(OptixDenoiserSizes));

  OPTIX_CHECK( m_api.optixDenoiserComputeMemoryResources(m_denoiser, m_width, m_height, &m_sizesDenoiser) );

  CU_CHECK( cuMemFree(m_d_stateDenoiser) );
  CU_CHECK( cuMemAlloc(&m_d_stateDenoiser, m_sizesDenoiser.stateSizeInBytes) );

  CU_CHECK( cuMemFree(m_d_scratchDenoiser) );
  CU_CHECK( cuMemAlloc(&m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

  OPTIX_CHECK( m_api.optixDenoiserSetup(m_denoiser, m_cudaStream, 
                                        m_width, m_height, 
                                        m_d_stateDenoiser,   m_sizesDenoiser.stateSizeInBytes,
                                        m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );
...

// One-time denoiser initialization:
void Application::initDenoiser()
{
  OptixDenoiserOptions optionsDenoiser;

  optionsDenoiser.inputKind   = OPTIX_DENOISER_INPUT_RGB;  // or OPTIX_DENOISER_INPUT_RGB_ALBEDO
#if USE_FP32_OUTPUT
  optionsDenoiser.pixelFormat = OPTIX_PIXEL_FORMAT_FLOAT4;
#else
  optionsDenoiser.pixelFormat = OPTIX_PIXEL_FORMAT_HALF4;
#endif

  OPTIX_CHECK( m_api.optixDenoiserCreate(m_context, &optionsDenoiser, &m_denoiser) );
  
  OPTIX_CHECK( m_api.optixDenoiserSetModel(m_denoiser, OPTIX_DENOISER_MODEL_KIND_HDR, nullptr, 0) ); // Need to set the model to be able to calculate the memory requirements.
  
  memset(&m_sizesDenoiser, 0, sizeof(OptixDenoiserSizes));

  OPTIX_CHECK( m_api.optixDenoiserComputeMemoryResources(m_denoiser, m_width, m_height, &m_sizesDenoiser) );

  MY_ASSERT(m_d_stateDenoiser == 0);
  CU_CHECK( cuMemAlloc(&m_d_stateDenoiser, m_sizesDenoiser.stateSizeInBytes) );
  
  MY_ASSERT(m_d_scratchDenoiser == 0);
  CU_CHECK( cuMemAlloc(&m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

  OPTIX_CHECK( m_api.optixDenoiserSetup(m_denoiser, m_cudaStream, 
                                        m_width, m_height, 
                                        m_d_stateDenoiser,   m_sizesDenoiser.stateSizeInBytes,
                                        m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

  m_paramsDenoiser.denoiseAlpha = 0;    // Don't denoise alpha, just copy it.
  m_paramsDenoiser.blendFactor  = 0.0f; // Show the denoised image only.
  CU_CHECK( cuMemAlloc(&m_paramsDenoiser.hdrIntensity, sizeof(float)) ); // Allocate a float on the device for the optixDenoiserComputeIntensity() result.

  if (!m_interop) // Only need to allocate the target buffer if it's not an OpenGL interop buffer (PBO or texture image). Code paths for that omitted below for brevity.
  {
#if USE_FP32_OUTPUT
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(float4) * m_width * m_height) );
#else
    CU_CHECK( cuMemAlloc(&m_d_denoisedBuffer, sizeof(Half4) * m_width * m_height) ); // Own struct Half4 because CUDA only implements half and half2 types.
#endif
  }
}

Application::render()
...
    // Update only the sysParameter.iterationIndex.
    m_systemParameter.iterationIndex = m_iterationIndex++;

    CU_CHECK( cuMemcpyHtoD(reinterpret_cast<CUdeviceptr>(&m_d_systemParameter->iterationIndex), &m_systemParameter.iterationIndex, sizeof(int)) );

    OPTIX_CHECK( m_api.optixLaunch(m_pipeline, m_cudaStream, (CUdeviceptr) m_d_systemParameter, sizeof(SystemParameter), &m_sbt, m_width, m_height, /* depth */ 1) );

    OptixImage2D inputLayer;

    inputLayer.data               = m_systemParameter.outputBuffer; // The noisy RGBA32F or RGBA16F image.
    inputLayer.width              = m_width;
    inputLayer.height             = m_height;
#if USE_FP32_OUTPUT
    inputLayer.rowStrideInBytes   = m_width * sizeof(float4);
    inputLayer.pixelStrideInBytes = sizeof(float4);
    inputLayer.format             = OPTIX_PIXEL_FORMAT_FLOAT4;
#else
    inputLayer.rowStrideInBytes   = m_width * sizeof(Half4);
    inputLayer.pixelStrideInBytes = sizeof(Half4);
    inputLayer.format             = OPTIX_PIXEL_FORMAT_HALF4;
#endif

    // Calculate the intensity on the outpuBuffer data.
    OPTIX_CHECK( m_api.optixDenoiserComputeIntensity( m_denoiser, m_cudaStream, &inputLayer, m_paramsDenoiser.hdrIntensity, m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

    OptixImage2D outputLayer;
    outputLayer.data               = m_d_denoisedBuffer; // The denoised RGBA32F or RGBA16F image, where A is copied from the input.
    outputLayer.width              = m_width;
    outputLayer.height             = m_height;
#if USE_FP32_OUTPUT
    outputLayer.rowStrideInBytes   = m_width * sizeof(float4);
    outputLayer.pixelStrideInBytes = sizeof(float4);
    outputLayer.format             = OPTIX_PIXEL_FORMAT_FLOAT4;
#else
    outputLayer.rowStrideInBytes   = m_width * sizeof(Half4);
    outputLayer.pixelStrideInBytes = sizeof(Half4);
    outputLayer.format             = OPTIX_PIXEL_FORMAT_HALF4;
#endif

      OPTIX_CHECK( m_api.optixDenoiserInvoke(m_denoiser, m_cudaStream, &m_paramsDenoiser,
                                             m_d_stateDenoiser, m_sizesDenoiser.stateSizeInBytes,
                                             &inputLayer, 1, 0, 0, &outputLayer,
                                             m_d_scratchDenoiser, m_sizesDenoiser.recommendedScratchSizeInBytes) );

...

m.breitzke · October 30, 2019, 12:10pm

thank you for your answer… I will compare step by step if there is anything I forgot …

about OptixDenoiserSizes … it wasn’t clear for what this overlapWindowSizeInPixels value is and I saw its usage in the “example11_denoiseColorOnly” example code. When this structure needs to be initialized before then this example is wrong, too. If I add the initialization it works. but without it fails… but thank you for that information!

m.breitzke · October 30, 2019, 12:28pm

I found the hanging problem, too. it was related to the hdrIntensity. the value wasn’t initialized correctly. now it’s working fine. thank you for your example code !!

droettger · October 31, 2019, 8:01am

I filed a bug report about the missing overlapWindowSizeInPixels return value.
The 441.08 driver code is actually not writing that member.
The correct code is present in main branches so this should be working in future drivers.
Normally the returned value would be 64.

Topic		Replies	Views
[OptiX 7] Tiled Denoiser OptiX	12	1206	June 14, 2022
How can this program using Optix Denoiser get more than just an all-black output? OptiX	9	159	December 31, 2024
OptiX denoiser is broken (?) after recent driver updates OptiX	16	2152	June 14, 2022
Optix 7 denoiser samples and questions ? OptiX	4	825	June 14, 2022
Optix denoiser implementation exhibits black square artifacts OptiX	20	174	September 20, 2024
Nvidia optiX OptiX	2	568	June 14, 2022
Optix denoiser and real world RGB image OptiX	13	1628	June 14, 2022
Optix Denoiser exceptions at certain buffer sizes OptiX	8	1606	June 14, 2022
How to check if Optix denoiser is supported by GPU? OptiX cuda , optix	4	56	June 16, 2025
[OptiX 7.2] Tiled denoiser errors in output OptiX	11	999	June 14, 2022

optix7 denoiser

Related topics