[OptiX 7] Tiled Denoiser

Hi,

I’m now implementing OptiX AI denoiser with tiling to my OptiX library:

I could find an official denoiser sample at:

but it seems not to implement tiled denoiser.
So I implemented tiled denoiser with some prediction.
Now it seems to work:

but I’m not confident that my implementation is correct.

Therefore, I have several questions:

  1. Are the results from tiled and non-tiled denoiser match?
  2. Is it correct to pass tile width/height (without overlapping) to optixDenoiserComputeMemoryResources?
  3. Is the value overlapWindowSizeInPixels in OptixDenoiserSizes the sum of left and right (top and bottom) overlap width? So if I pass 32x32 to optixDenoiserComputeMemoryResources as tile size and get overlapWindowSizeInPixels == 64, will the width/height of input layers be 96x96 (actual output size is 32x32 for most region)?
  4. If Q.2 is true, I’m confused from the description:
    'outputWidth' and 'outputHeight' must be greater than or equal to the dimensions passed to optixDenoiserSetup.
    at the comments for optixDenoiserComputeMemoryResources.
    Is this description correct? Because the description at optixDenoiserSetup says:
    'inputWidth' and 'inputHeight' must include overlap on both sides of the image if tiling is being used.
    and I receive the value overlapWindowSizeInPixels == 64 with tile size is 32x32, this is impossible.

Thanks,

In addition to the questions, I noticed that OptiX 7.1.0 SDK actually has (non-tiled) denoiser sample but it forgot to add the project to the root “OptiX SDK 7.1.0\SDK\CMakeLists.txt”

First, I would recommend to use OptiX 7.1.0 and R450 drivers.
The denoiser interface has been changed slightly and the actual implementation has been improved.

I adjusted my OptiX 7 advanced examples last week to compile with either OptiX 7.0.0 or 7.1.0. Default is still 7.0.0 and switching to 7.1.0 only requires to change the resp. find_package(OptiX7 REQURED) to find_package(OptiX71 REQURED) inside the CMakeLists.txt files.
Look for the added OPTIX_VERSION compile-time checks in the intro_denoiser example to see where the API has been changed between the two OptiX 7 versions.

  1. Are the results from tiled and non-tiled denoiser match?

I cannot say if the result is pixel identical. Never tried myself. It’s running the same denoiser and the overlap size of pixels is meant to cover the internal kernel size in a way that there are no seams between inner tile borders.

  1. Is it correct to pass tile width/height (without overlapping) to optixDenoiserComputeMemoryResources ?

Yes, as documented: https://raytracing-docs.nvidia.com/optix7/api/html/group__optix__host__api__denoiser.html#ga2020ac0b7346bc7f1b256f8fea1cf140

Note that OptiX 7.1.0 changed the OptixDenoiserSizes structure to be able to get the scratch spaces and the overlap in pixels in a single call where OptiX 7.0. would have required two calls to get the overlap size first and then the required scratch spaces.

  1. Is the value overlapWindowSizeInPixels in OptixDenoiserSizes the sum of left and right (top and bottom) overlap width? So if I pass 32x32 to optixDenoiserComputeMemoryResources as tile size and get overlapWindowSizeInPixels == 64 , will the width/height of input layers be 96x96 (actual output size is 32x32 for most region)?

It’s the single overlap to adjacent data, means a 64 pixel wide border around the center tile.
Means in you case, for the inner tiles it would be 64 + 32 + 64 and for the tiles at the edges it’s 32 + 64.
That isn’t going to work, because you wouldn’t have enough data for the second and second to last row or column of your full image.
I would not recommend to do this at all, see below.

  1. If Q.2 is true, I’m confused from the description:
    'outputWidth' and 'outputHeight' must be greater than or equal to the dimensions passed to optixDenoiserSetup.
    at the comments for optixDenoiserComputeMemoryResources .
    Is this description correct? Because the description at optixDenoiserSetup says:
    'inputWidth' and 'inputHeight' must include overlap on both sides of the image if tiling is being used.
    and I receive the value overlapWindowSizeInPixels == 64 with tile size is 32x32, this is impossible.

The overlap size in pixels is the overlap between two adjacent tiles at each edge.
Means if you have 32x32 tiles, you need at least two rows of these tiles around your center tile to fulfill the required 64 pixels overlap around it.
Tiles at the borders of the full image do not have overlap outside the full image. The overlap only applies to the inner edges.

I don’t see these quotes inside the OptiX 7.1.0 online documentation.
EDIT: Found it inside the API reference. That’s one of the things changed between OptiX 7.0.0 and 7.1.0.

That actually sounds confusing, but the optixDenoiserComputeMemoryResources is used to calculate the required memory, which needs to be the same size or bigger than the input image in optixDenoiserSetup. That’s why the docs say output sizes in optixDenoiserComputeMemoryResources need to be the same or bigger than input sizes in optixDenoiserSetup. Otherwise you allocate not enough memory.

If you normally setup your denoising on full images you simply give the full input image size to the setup, you don’t have more data after all. You can then do tiled denoising on that full input image by setting up the proper OptixImage2D for the tiles with overlap by calculating the start address and size as required. The row stride will take care to access the proper input pixels of the full image. In that case the denoiser invocation is on a smaller size than that input size.

Please read this OptiX 7.1.0 chapter and look carefully at the diagrams and code listing there:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#ai_denoiser#using-image-tiles-with-the-denoiser

Now if you are planning to only allocate as much data as a single tile requires, the documentation says that should include the overlap because in that case your individual denoiser invocation will happen on at least that input size for an inner tile. The ones at the edge would require less memory. The issue with that is that you need to be able to fill the data of this single tile with all its overlap pixels and still need to setup the OptixImage2D differently for tiles at the edge of the image and inner tiles with full border pixels.
This would be an even bigger hassle than doing tiled denoising on a full image with the proper offsets and strides to pick out individual tiles. You would also need to copy the denoised tile to its final location.

In addition to the questions, I noticed that OptiX 7.1.0 SDK actually has (non-tiled) denoiser sample but it forgot to add the project to the root “OptiX SDK 7.1.0\SDK\CMakeLists.txt”

Right, that’s known. https://forums.developer.nvidia.com/t/optixdenoiserinvoke-pixel-format/139854/4

Generally it doesn’t make sense to use tiled denoising if it isn’t for memory consumption issues.
There was a time when the denoiser didn’t work on huge images in the past, e.g. 8192 * 8192 was over the limit of 2 GiG elements inside cuDNN. That doesn’t apply to the denoiser in R450 drivers anymore.
It’s faster to denoise a whole image, so even with a tiled renderer I would render the full image first and denoise once at the end.
For that it’s recommended to use the minimum amount of tiles in the denoiser, e.g. when in the past 8192x8192 didn’t work, then two tiles, e.g. top and bottom of the image with 8192*(4096+64) and 8192*(64+ 4096) denoiser tiles would have been sufficient to overcome the previous limitation.
If it’s for memory limitations, still use the biggest denoiser tile size you can afford.

I also wouldn’t use 32x32 tiles for a GPU ray tracer unless each tile renders all samples per pixel at once. The number of threads should be well above 64k, otherwise the number of threads would be much too small to saturate a recent GPU.

Thank you for your quick reply.
It clarifies my understanding.

I don’t see these quotes inside the OptiX 7.1.0 online documentation.

I can find:

The dimensions passed to optixDenoiserComputeMemoryResources and optixDenoiserSetup seem to be tileSize (32x32) and tileSize + 2 * overlapWindowSizeInPixels respectively according to your clarification but the former obviously smaller than the latter.

For that it’s recommended to use the minimum amount of tiles in the denoiser

I also wouldn’t use 32x32 tiles for a GPU ray tracer

Thanks for advises, I will follow.

I just found them and edited my answer above.

Thanks.

BTW, you wrote:

That isn’t going to work, because you wouldn’t have enough data for the second and second to last row or column of your full image.

According to the fig. 13.5:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#ai_denoiser#using-image-tiles-with-the-denoiser
The first tile’s output size is 32 + 64, then the offset to the second tile is 96 (Input window size is always 32 + 64 * 2) so I think this still works.

Nope, this is all about the input data to the denoiser. The output size should always be the tile size without overlap. I think the listing there is doing it wrong.

Let’s use pixel coordinates with lower left origin. If the tile in the bottom left corner is 32x32 at pixel coordinates (0,0) to (31, 31) then you need to invoke the denoiser with 64 pixel overlap on the right and top of it. Means the first invocation is with 96x96 pixels of which you only need the lower left 32x32 in the output. => Corner and edge tiles get special handling.

Now step one 32x32 tile to the right which lies at pixel coordinates (32,0) to (63, 31). Do you spot the problem?

Since this tile has adjacent tiles at the left, right and top you would need to invoke the denoiser with 64 pixels overlap on all three edges, which would be a denoiser invocation with input width 64+32+64 and input height 32+64 pixels input. That’s obviously not possible because you only have 32 pixels data to the left because the tile starts at coordinate 32.

Means while the denoiser invocations themselves would work for your 32x32 case on properly sized OptixImage2D, I would expect that the first inner edge between the tiles of your full image do not have the correct result but a visible seam because 32 pixels of data was missing to fill up the necessary data in the AI network.

This would all just work with at least 64x64 sized tiles. but again, unless you’re running into any memory constraints, there is no benefit over a full image denoiser invocation…

Do you mean that what the online doc explains (Fig. 13.4, 13.5 and listing 13.13 and “optix_denoiser_tiling.h” in the SDK) are all wrong?
Fig. 13.4 and 13.5: output sizes (sizes of red rectangles) at the lower-left corner and left-edges, lower-edges are bigger than that of inner tiles.
Listing 13.13: output size is obviously variable at corners and edges.

I think if the doc is wrong and output size is always fixed, it seems more natural.

Yes, if the doc is wrong, your explanation totally makes sense.

Yes, I mean either code calculates an output size which is tile size + overlap for the tiles on the left and lower edges of the full image which I wouldn’t have done.
To me that looks incorrect and that’s why I also filed a bug report to hear what the engineer who implemented the denoiser and that tiling function has to say about it.
Again I haven’t tried it. If that was done for a reason, then there needs to be a lot more documentation about it. ;-)

While looking at that I also noticed that the green square inside the Figure 13.3 has the wrong legend. That needs to be in.width, not out.width which is the inner red square.

OK, understand.
I wait the update from you and the next release while playing around on my-side.

Thanks!

Huh? That tiling code is inside the OptiX SDK source. There is no need to wait for an update to change it.

Well, looking again, I don’t quite see why the function does what it does, esp. the part about the input sizes is completely different to what I would have expected and tried to explain above and I don’t know what is going on there (should work but looks like a performance issue), so let’s wait for the final word on it. That can take a while due to vacations.