Fast denoising of high number of low-res images

droettger · September 13, 2023, 10:07am

I was also using modelKind OPTIX_DENOISER_MODEL_KIND_HDR and pixel format OPTIX_PIXEL_FORMAT_FLOAT3

That “also” is in addition to the other system configuration options or do you mean you tried other denoiser input formats as well?
Asking because the OptiX denoisers are using half internally, so using float formats as input might be slower than it needs to be.
There can also be a difference between 3- and 4-component inputs due to hardware vectorized loads for 4-components, so maybe try OPTIX_PIXEL_FORMAT_HALF4 instead.

Please read this thread about which denoisers models are recommended today:
https://forums.developer.nvidia.com/t/optix-8-0-denoiser-camera-space-vs-world-space/262875/4

The AOV denoiser models got continuous quality and performance improvements over driver versions and should definitely be tested instead of the LDR and HDR models which didn’t.

The denoiser implementation is part of the display driver, like the OptiX 7/8 implementation itself.
Means improvements in denoiser performance can be expected by changing display drivers and denoiser models rather than OptiX SDK versions.
If possible I would always recommend using the newest possible OptiX SDK versions though.
The OptiX Denoiser API changed among SDK versions and some small application code adjustments might be necessary. Always read the OptiX Release Notes when switching OptiX SDK versions.

The optixDenoiser example inside the OptiX SDK releases shows the usage of the different denoiser models on loaded images data.

In principle the optixDenoiserInvoke calls should run fully asynchronously to the CPU since they are taking a CUDA stream argument.

Your observation that sizes below 256x256 aren’t scaling well is mainly due to not saturating the GPU with such small workloads, which is depending on the respective underlying installed hardware resources.
For such cases running multiple denoiser invocations in separate CUDA streams can actually scale, but don’t overdo it. Using 100 CUDA streams is unlikely to help. Switching between them isn’t free either. I would have used a maximum of 8 or 16 maybe. Benchmark that.
Also I would recommend to not actually use the CUDA default stream for that, which might have different synchronization behavior. (When using the CUDA Driver API you have full control about that.)

I wouldn’t be surprised if there is still some fixed overhead in the denoising invocation which would become visible with many small inputs. That would need to be investigated if that is still the case with the AOV denoiser on workloads which saturate the GPU.

Topic		Replies	Views
Denoising benchmarks with GeForce GTX 1080 OptiX	3	931	June 14, 2022
optix7 denoiser OptiX	5	1284	June 14, 2022
OptiX denoiser is broken (?) after recent driver updates OptiX	16	2326	June 14, 2022
Optix denoiser and real world RGB image OptiX	13	1775	June 14, 2022
Optix denoiser compute device options OptiX	6	1590	October 12, 2021
[OptiX 7] Tiled Denoiser OptiX	12	1320	June 14, 2022
Nvidia optiX OptiX	2	603	June 14, 2022
How can this program using Optix Denoiser get more than just an all-black output? OptiX	9	242	December 31, 2024
Optix for image sequences OptiX	4	1218	October 4, 2023
Optix Denoiser for Real-time Denoising OptiX	2	934	June 14, 2022

Fast denoising of high number of low-res images

Related topics