OptiX denoiser is broken (?) after recent driver updates

Hi everyone.
I’m shipping a lightmapper equipped with OptiX Denoiser 5, 6 and 7. They all worked well (apart from 5 not running on 3XXX cards, which is, I guess, by design), but after recent driver updates I started getting many complaints about 6 and 7 producing some weird patterns instead of the denoising. 5 seems unaffected (apparently because it keeps its data in a huge local DLL, not in the driver).
Patterns look like this:

From what I learned, it definitely happens on driver version 471.41 and it definitely doesn’t happen on 460.79.
Are there any changes that can be related? Are pre-7.3. denoisers unsupported by the driver and should I use 7.3 now?

Hi @Mr_F,

I’m not immediately aware of such issues, but I am forwarding your report to the denoising team for comment.

Can you share the raw inputs to the denoiser for one of these examples? (Meaning the HDR or LDR beauty layer, along with any other data being passed, such as albedo, normals, AOVs, etc.) It would also help if you could collect the parameter inputs as well, and the results of any denoising prep calls, for example the modelKind, sizes, average color and/or hdrIntensity, scratch sizes, image sizes, etc… Are you using tiling, temporal denoising, or AOVs?


David.

Hi @dhart ,

Here is the input (half-float DDS): https://drive.google.com/file/d/12dq2c8Shs8GtSzhSLLPqG70xE7Fc9zJ3/view?usp=sharing

When using OptiX 7.2, what I do is:
Internally convert half to float via CUDA (the conversion itself is unaffected, looks good)
optixInit
optixDeviceContextCreate
createOptixImage with OPTIX_PIXEL_FORMAT_FLOAT4 (twice, for input and ouput)
optixDenoiserCreate with OPTIX_DENOISER_INPUT_RGB
optixDenoiserSetModel with OPTIX_DENOISER_MODEL_KIND_HDR
optixDenoiserComputeMemoryResources
optixDenoiserSetup
optixDenoiserInvoke

The data passed is a color lightmap, there is no additional data. Not using tiling/temporal, it’s pretty simple.

The result is what’s seen in the first post, and it worked in the past, weird.

Can fetch avg. intensity and other prep data in a few days (not at my PC atm).

Also reproduced it on an RTX 3090 with studio driver v471.68.

Okay, thank you for the repro data. Just so you know what to expect, our denoiser engineer is out of the office for a couple of weeks so we might not get this properly looked at until they’re back, but we will use your data to try to reproduce here, and if so file & fix the issue.

You answered a couple of questions I had but didn’t ask yet. :) A couple of other questions I have are whether the results change if you use larger or smaller resolutions, for example if you scaled your inputs to one quarter resolution, does the bug still occur? Also it might be worth checking what you get if you use the optixDenoiser SDK sample on your input images - are the same artifacts still visible in that case? (This is the first thing I’ll try myself in the next day or so.)


David.

1 Like

Please note that this is not necessary for the denoiser. It actually prefers half data input.
Means your denoising algorithm should be faster overall when keeping the data in half format because that halves the memory size and therefore required memory bandwidth and saves the conversion time

Another thing to try is switching from the HDR denoiser to the AOV denoiser with just the noisy beauty image.
The programming guide explains the differences (use of optixDenoiserComputeAverageColor function and hdrAverageColor field).
https://raytracing-docs.nvidia.com/optix7/guide/index.html#ai_denoiser#nvidia-ai-denoiser

Interestingly, it seems like the issue doesn’t occur even with 1/2 size. The problem occurs at 4096x4096 and higher, but doesn’t occur with 2048x2048.

Technically I can split the image into 2048x2048 tiles as a workaround, but it would be nice if it worked out of the box or at least failed with some error code. My implementation attempts splitting the image if the denoiser fails.

Hey this is good to know, I’m adding it to the report, thanks! Glad you have a workaround for the time being, but I agree it’s not great to require a tiling fallback since it takes longer. As long as you have all the output & scratch memory allocated, and there aren’t any OptiX or CUDA status errors before you denoise, then I don’t think it should be failing, and we agree that if there is a reason for the failure, it should be returned as an error code.

Hey I didn’t quite understand from the original image snippets what exactly the expected output should look like versus what the problem looks like. I haven’t inspected or converted the dds file either. (BTW what tools can deal with dds files?) I was wondering if it would be possible to post a jpg version of an input image, along with the denoised output with and without the problem (once large with the bug, and once with tiling enabled without the bug)? If they’re all cropped and exposed identically, it might be easier/quicker for the team to get a sense of the problem and expected output.


David.

Personally I just treat them as raw pixel buffers with tiny headers. Height and width start at byte offset 12, and the pixel data is at offset 128 (when using non-DX10-style header). Modern versions of Visual Studio also can open/convert DDS. Photoshop can read it using Nvidia/Intel plugins. So all my tools tend to store images in dds as they’re super easy to construct (if you don’t need all features) and still open in conventional tools.

Maybe an .exr or an .hdr, as we’re testing HDR denoising?

Okay, thanks for the DDS info. Yes EXR or HDR would be totally fine. I would still be nice to have exposed versions as well for reference, if it’s not too much trouble, since the expected exposure & gamma can sometimes get lost in the shuffle when passing around hdr formats.


David.

Attaching exr and jpg: https://drive.google.com/file/d/1yTfNQKIheeVJS8d6XZBx9l_UrDKHwTOf/view?usp=sharing

…although it seems to be independent of content, just high resolution is the problem. And I’m checking every CUDA/OptiX call for !=0, they all look good…

Thanks! So this is the input image, correct? I was hoping to also see two versions of the output image, one with the bug and one without. Is that possible?


David.

Yes.
OK, here is an actual input/output pair of EXR+JPG images:
https://drive.google.com/file/d/1HEjNR9ArSV4-8V6FteJK9_s3IJiG-QRW/view?usp=sharing

Hi @Mr_F, thanks again for the input data! The denoiser team was able to look at your data and reproduce the problem. There is a real bug here you’ve identified which is going to get fixed in an upcoming driver, however it may take a couple of months to fix and then percolate through our QA and release process.

In the mean time, you are still able to use tiling to work around the issue, right? Hopefully you can continue to do that, and that the tiling option is not terribly slow – we would expect that the overheads of tiling at the scale of 4k+ images will not be very large compared to untiled denoising (and hopefully the memory consumption is reduced as a side-benefit). Another option the team identified is you could use the OptiX 7 denoiser, and enable the AOV mode, which uses a different convolution kernel that is not subject to the issue you bumped into. If you have memory to spare, it sounds like with the OptiX 6 denoiser there is also an option to increase the memory limit high enough to trigger the denoiser to enable automatic tiling. That memory limit should be somewhere in the neighborhood of 200 MB.

If you were already thinking of moving to OptiX 7 at some point, maybe this is a decent excuse to try it. If that’s not reasonable at the moment, we are expecting the fix for this issue to appear in drivers numbered 495 and higher, a couple of months from now. Apologies for the bump in the road, and thank you kindly for reporting it and sharing repro data. Let me know if you have any questions.


David.

1 Like

Thanks for confirming! Workaround performance is OK and I’ll continue using it for now then.

Sounds good. So one last tidbit, you are manually splitting your image into tiles, is that correct? Just in case this applies to you, one thing to be aware of is that correct tiling with the denoiser needs to have an overlap region between tiles, otherwise you might get seam artifacts at the tile edges. Currently the overlap region is 64 pixels, but the OptiX 7 API has explicit calls to determine the overlap size should it change in the future, and the OptiX 6 internal tiling handles this overlap region for you.

If your tiles are just separate light maps whose borders are also the tile edges, then you don’t really need to worry about any overlap regions. But if your tile seam goes through the middle of one or more of your light maps or other inputs to the denoiser, then we recommend incorporating the overlap regions into your image splitting & re-merging code.


David.

1 Like

@Mr_F, an update - the denoiser team should have the fix for this in the next driver release, expected some time next week.


David.

1 Like