Optix Denoiser exceptions at certain buffer sizes

I’m having problems with denoising buffers of certain sizes.

If I try to use the denoiser with an input buffer of 4000x3001 -> 4000x3013 it crashes on 5.0.0 and causes an exception on 5.1.0

My tests show the problem on these ranges 4000x3001 -> 4000x3013, 4000x3015 -> 4000x3046, 4000x3048 -> 4000x3127.

There’s lots of other sizes that cause the same issue.

Is there a max/min buffer size that the denoiser can handle reliably?

This is the simple test program I used :

#include <iostream>

#include "optix.h"
#include "optixu/optixpp_namespace.h"

const char* EMPTY_PROGRAM_PTX = {
	"//\r\n"
	"// Generated by NVIDIA NVVM Compiler\r\n"
	"//\r\n"
	"// Compiler Build ID: CL-23083092\r\n"
	"// Cuda compilation tools, release 9.1, V9.1.85\r\n"
	"// Based on LLVM 3.4svn\r\n"
	"//\r\n"
	"\r\n"
	".version 6.1\r\n"
	".target sm_30\r\n"
	".address_size 64\r\n"
	"\r\n"
	"	// .globl	_Z12emptyProgramv\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register20reg_bitness_detectorE;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail0E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail1E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail2E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail3E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail4E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail5E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail6E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail7E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail8E;\r\n"
	".global .align 8 .u64 _ZN21rti_internal_register24reg_exception_64_detail9E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail0E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail1E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail2E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail3E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail4E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail5E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail6E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail7E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail8E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register21reg_exception_detail9E;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register14reg_rayIndex_xE;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register14reg_rayIndex_yE;\r\n"
	".global .align 4 .u32 _ZN21rti_internal_register14reg_rayIndex_zE;\r\n"
	"\r\n"
	".visible .entry _Z12emptyProgramv(\r\n"
	"\r\n"
	")\r\n"
	"{\r\n"
	"\r\n"
	"\r\n"
	"\r\n"
	"	ret;\r\n"
	"}\r\n"
};

int main( int argc, char* argv[] )
{
	// Get width and height from input params
	if ( argc < 3 ) {
		std::cout << "Parameters : denoisetest width height" << std::endl;

		return 1;
	}

	RTsize		width = atoi( argv[1] );
	RTsize		height = atoi( argv[2] );

	try {
		optix::Context		context = optix::Context::create();

		context->setRayTypeCount( 1 );
		context->setEntryPointCount( 1 );
		context->setStackSize( 1800 );

		// Create empty program
		optix::Program emptyProgram = context->createProgramFromPTXString( EMPTY_PROGRAM_PTX, "emptyProgram" );
		context->setRayGenerationProgram( 0, emptyProgram );

		optix::PostprocessingStage denoiserStage = context->createBuiltinPostProcessingStage( "DLDenoiser" );
		denoiserStage->declareVariable( "blend" )->setFloat( 0.f );

		// Create input and output buffers
		optix::Buffer inputBuffer = context->createBuffer( RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT4, width, height );
		optix::Buffer outputBuffer = context->createBuffer( RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT4, width, height );

		// Set into denoiser
		denoiserStage->declareVariable( "input_buffer" )->set( inputBuffer );
		denoiserStage->declareVariable( "output_buffer" )->set( outputBuffer );

		// Fill input buffer with black values 
		size_t		numPixels = width * height;
		float		*dst = (float*)inputBuffer->map( 0, RT_BUFFER_MAP_WRITE_DISCARD );

		for ( size_t i = 0; i < numPixels; ++i ) {
			*dst++ = 0.f;
			*dst++ = 0.f;
			*dst++ = 0.f;
			*dst++ = 1.f;
		}

		inputBuffer->unmap();

		// Create command list, first append launch of 1, 1 was to combat bug in optix 5.0 when running denoiser
		optix::CommandList commandList = context->createCommandList();
		commandList->appendLaunch( 0, 1, 1 );
		commandList->appendPostprocessingStage( denoiserStage, width, height );
		commandList->finalize();

		commandList->execute();

		std::cout << width << "x" << height << " Denoise ok" << std::endl;

		return 0;
	}
	catch ( const optix::Exception& ex ) {
		std::cout << width << "x" << height << " Nvidia exception " << ex.getErrorString() << ", " << ex.getErrorCode() << std::endl;
	}
	catch ( ... ) {
		std::cout << width << "x" << height << " Exception" << std::endl;
	}

	return 1;
}

This only seems to be a problem on my GeForce GTX 1080 Ti 24.21.13.9793, I try the same test on a GT 750M and it works fine

I would not recomend to use OptiX 5.0.0 anymore now that there are 5.0.1 and even 5.1.0 versions available.

What’s the exact error reported in the exception?
It might be that the memory allocation is failing for these big sizes. OptiX 5.0.0 actualy had a bug with small sitzes, alas the above recommendation.

The GPUs architectures are different on your two boards and they are going to take different code paths, which could indicate a problem for the Pascal code path.

OptiX 5.1.0 has added HDR denoising and a variable named “maxmem” to the denoiser stage to limit the amount of memory. It also doesn’t need that dummy launch anymore.

I wrote two tutorials about the DL Denoiser, one shows how to apply the OptiX denoiser in 5.0.x ("optixIntro_09) and one for OptiX 5.1.0 (optixIntro_10) using the new and highly recommended HDR mode.
You can find the source code on github.com in the OptiX Advanced Samples repository.
Link in this sticky post: https://devtalk.nvidia.com/default/topic/998546/optix/optix-advanced-samples-on-github/

If you look at the optixIntro_10 code and search for the “maxmem” variable, there are three code lines currently commented out which can be used to limit the memeory for the denoiser.
It will currently not exactly use a minimal tilesize in OptiX 5.1.0, but it will at least limit the working sizes to horizontal stripes of 160 pixels height.

Please try one by one with OptiX 5.1.0 if 1.) removing the dummy launch, 2.) using the HDR denoiser and 3.) setting the “maxmem” variable to some small value like 10 MB solves the exception.

I tried removing the dummy launch and using the HDR denoiser, it exceptions with (Details: Function “_rtCommandListExecute” caught exception: Failed to launch DLDenoiser post-processing stage. DLDenoiser run method failed with error -4711.) if I try any height within the range of 3001 to 3127. Anything outside this height range works fine.

When I added the “maxmem” setting everything works great.

Is there a recomended memory size for this setting? The sample shows 512Mb

You mean higher values than 3127 work as well? That’d be strange and would need to be investigated.

With your image sizes OptiX 5.1.0 will limit the tile size to a minimum of 4000 * 160 and that’ll take over 200 MB when using beauty and albedo buffers whatever smaller maxmem value you use. It’s basically linear with the width. I expect future OptiX versions to handle that more fine grained.

Higher values work fine. I first spotted it when it errored at 4000x3063 but worked fine at 4000x4000. I then wrote the test app to check the limits of the exceptions. I’ve gone as big as 10000x10000 with no problems.

My quick fix was to detect a height > 3000 && height < 3128, just pad out the height to 3128 and replicate the last line until its 3128 in height.

Will probably go to a tiling method if I need much larger buffers.

Ok, thanks for the clarification. That’s unexpected. I’ll let our denoiser expert know.

Thanks a lot for the report!
That has been identified as an OptiX bug and will be fixed in future versions.
Please keep using the maxmem variable with OptiX 5.1.0 to limit allocations as a workaround.