OptiX 7 Denoiser with pitched buffers

Hi,

How to use pitched buffers in OptiX 7 Denoiser?
I used cudaMallocPitch to create input and output OptixImage2D buffers. However,the results are stretched/blurred.
(It works with cudaMalloc)

The code is illustrated below:

OptixImage2D inputLayer;
inputLayer.data = (CUdeviceptr)inputBuffer.ptr();
inputLayer.width = inputBuffer.width();
inputLayer.height = inputBuffer.height();
inputLayer.rowStrideInBytes = inputBuffer.pitch();
inputLayer.pixelStrideInBytes = inputBuffer.stride();
inputLayer.format = OPTIX_PIXEL_FORMAT_FLOAT4;

OptixImage2D outputLayer;
outputLayer.data = (CUdeviceptr)_denoisedBuffer.ptr();
outputLayer.width = _denoisedBuffer.width();
outputLayer.height = _denoisedBuffer.height();
outputLayer.rowStrideInBytes = _denoisedBuffer.pitch();
outputLayer.pixelStrideInBytes = _denoisedBuffer.stride();
outputLayer.format = OPTIX_PIXEL_FORMAT_FLOAT4;

OPTIX_CHECK(optixDenoiserInvoke(
    _denoiser,
    /*stream*/0,
    &denoiserParams,
    (CUdeviceptr)_denoiserState.ptr(),
    _denoiserState.size(),
    &inputLayer, 1,
    /*inputOffsetX*/0,
    /*inputOffsetY*/0,
    &outputLayer,
    (CUdeviceptr)_denoiserScratch.ptr(),
    _denoiserScratch.size()));

Thanks for the help

What’s your GPU and display driver version?

Could you please provide all values of both the inputLayer and outputLayer structs and the arguments to your cudaMallocPitch() call in the failing case?

It is a GTX 2080 Ti with Driver 436.30.

In fact, I tried the latest version 441.87, it was worse that even cudaMalloc didn’t work. The behavior was more likely messing up geometries rather than just stretching pixels

Input Layer and Output Layer:

inputLayer	{data=51868860416 width=1898 height=983 ...}	OptixImage2D
data	51868860416	unsigned __int64
width	1898	unsigned int
height	983	unsigned int
rowStrideInBytes	30720	unsigned int
pixelStrideInBytes	16	unsigned int
format	OPTIX_PIXEL_FORMAT_FLOAT4 (8708)	OptixPixelFormat

outputLayer	{data=51589939200 width=1898 height=983 ...}	OptixImage2D
data	51589939200	unsigned __int64
width	1898	unsigned int
height	983	unsigned int
rowStrideInBytes	30720	unsigned int
pixelStrideInBytes	16	unsigned int
format	OPTIX_PIXEL_FORMAT_FLOAT4 (8708)	OptixPixelFormat

The cudaMallocPitch for inputBuffer and _denoisedBuffer:

CUDA_CHECK(cudaMallocPitch((void**)&_cuBuffer, &_pitch, sizeof(float4) * _width, _height));

inputBuffer and _denoisedBuffer both have:
width = 1898, height = 983, pitch = 30720

[ DENOISER]: using cuda device “GeForce RTX 2080 Ti” (7.5), buffers: fp16, cuDNN v7500, rt v10010

Would you be able to provide a minimal, self-contained reproducer in failing state for this issue?