[OptiX 7.2] Tiled denoiser errors in output

I’m adding tiling to a denoiser example for a graphics engine, but the denoised output image has bars that shouldn’t be there.
this is the output with the bars:

Those bars are of the same width as the overlap region, in the image above (which is 3840x2160 px big) the tile size is set to 1024x1024, and the overlap to 64px. Vertical bars appear to be copied from the left, while horizontal ones are completely messed up.

This is the code i use to tile the image:

unsigned int get_pixel_size(OptixPixelFormat pixelFormat)
    switch (pixelFormat) {
        return 3 * sizeof(unsigned short);
        return 4 * sizeof(unsigned short);
        return 3 * sizeof(float);
        return 4 * sizeof(float);
void IDenoiser::createTilesForDenoising(
    CUdeviceptr inputBuffer,
    CUdeviceptr outputBuffer,
    size_t                inputWidth,
    size_t                inputHeight,
    OptixPixelFormat   pixelFormat,
    size_t                overlap,
    size_t                tileWidth,
    size_t                tileHeight,
    std::vector<Tile>& tiles)
    int pixelSize = get_pixel_size(pixelFormat);
    int rowStride = inputWidth * pixelSize;

    int  pos_y= 0;
    do {
        int inputOffsetY = pos_y==0? 0 : overlap;
        auto avaible_height = inputHeight - pos_y;
        int actualInputTileHeight = std::min(avaible_height, overlap + tileHeight) + inputOffsetY;

        int pos_x = 0;
            int inputOffsetX = pos_x == 0 ? 0 : overlap;
            auto avaible_width = inputWidth - pos_x;
            int actualInputTileWidth = std::min(avaible_width, overlap + tileWidth) + inputOffsetX;

            Tile tile{};
                auto in_posx = pos_x - inputOffsetX;
                auto in_posy = pos_y - inputOffsetY;
                tile.input.data = inputBuffer
                        + in_posy * rowStride
                        + in_posx * pixelSize;
                tile.input.width = actualInputTileWidth;
                tile.input.height = actualInputTileHeight;
                tile.input.rowStrideInBytes = rowStride;
                tile.input.format = pixelFormat;
                tile.output.data = outputBuffer
                        + pos_y * rowStride
                        + pos_x * pixelSize;
                tile.output.width = std::min(avaible_width, tileWidth);
                tile.output.height = std::min(avaible_height, tileHeight);
                tile.output.rowStrideInBytes = rowStride;
                tile.output.format = pixelFormat;
                tile.inputOffsetX = inputOffsetX;
                tile.inputOffsetY = inputOffsetY;
            pos_x += tileWidth;
        } while (pos_x < inputWidth);
        pos_y += tileHeight;
    } while (pos_y < inputHeight);    

Do you have any ideas what may be the cause, and what is the solution to getting rid of those bars?

Please always include your system configuration information when asking about OptiX issues to reduce the turnaround time and to allow potential reproducers:
OS version, installed GPU(s), VRAM amount, display driver version, OptiX version (major.minor.micro), CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

Is your algorithm in any way different than the OptiX helper function optixUtilDenoiserSplitImage() in OptiX SDK 7.2.0\include\optix_denoiser_tiling.h?

Have you tried using that instead?

Do you really need to tile the image for denoising? That will only slow down the denoiser.

That tiling mechanism runs on full size input and output images. If you’re trying save memory by working on individual image tiles only, the tile data data would need to look very different.

Sorry about lack of my system info, first time on Nvidia Forum.
OS Win 10 Pro 10.0.18363,
GPU GTX 1060 6GB, driver 457.09
CUDA version installed: 11.0.2,
OptiX 7.2.0,
Host Compiler version -MSVC, I use driver API and not runtime API, all symbols are loaded from DLL

Thanks for notifying me that optixUtilDenoiserSplitImage() exists, totally missed it. I switched to optixUtilDenoiserInvokeTiled which tiles the image and invokes the denoiser. This is the result.

Tile resolution and overlap size are the same as in the image in the previous post.

As for whether tiling is necessary - it is, when trying to denoise images of 8k resolution and greater, my 6 GB of VRAM is not enough.

Ok, thanks. I’ll file a bugreport tomorrow.

What’s the exact pixelformat in the failing case?

Would you be able to provide a minimal but complete reproducer in failing state which demonstrates the issue?
Source code appreciated.
If confidentiality is required, you can send that to OptiX-Help (at) nvidia.com (max. 10 MB, no *.zip extension, rename that to *.zi_, or *.7z should do). Google Drive should do as well. Other file sharing sites will not work! Or send it via a private message with attachment or link to me.

In main.cpp there is the code that invokes the denoiser, around line 800,
the format is OPTIX_PIXEL_FORMAT_HALF3 and as input there are 3 textures, RGB, Albedo and Normals

A bit of an update,
I have prepared a .zip with an exe file to showcase this problem, as well as another version of it with debug info, as when attempting to launch the exe with outdated optix results in crashes. If that happens, please extract the debug files to /bin, then attach the program to visual studio

and here the debug version