Implementing image filter kernels at runtime

roman.vlk18 · February 18, 2019, 3:32pm

I’m just making my way through image kernels and I was wondering if there’s a way to apply a post processing filter at runtime?

I realize the filtered image would need to be saved in a temporary buffer before swapped onto the final image, but I’m quite confused as to how to implement it… Is there any way I can check if all threads have finished so I have a clear image to work with?

Regardless, I’ve written the following as an implementation, but if gives me an error (obviously).
I’ve added the error after the code segment

/*
Code runs after the loop

for(;;)
{
 . . .
 rtTrace()
 . . .
}

and after the tone mapping
*/

if (frame > 1)
{
    // Filter size and edge distance
    int _size = 3;
    int _edge = floor(_size / 2.f);
  
    size_t2 screen = output_buffer.size();
  
    // Make sure I'm inside the frame
    if ((launch_index.x - _edge) > 0 && (launch_index.y - _edge) > 0 &&
	    ((launch_index.x + _edge) < screen.x) && ((launch_index.y + _edge) < screen.y))
    {

    	    // Define a Gaussian filter
	    float _filter[3][3] = {
	    				{0, 1, 0},
					{1, 2, 1},
					{0, 1, 0}
				  };
  

  	    // Initialize convoluted and filter sums
	    float4 _sum = make_float4(0);
	    float _fsum = 0;
  	  
	    for (int i = -_edge; i < _edge + 1; i++)
		    for (int j = -_edge; j < _edge + 1; j++)
		    {
		    	    // Get new index with offset
			    uint2 new_index = make_uint2(launch_index.x + i, launch_index.y + j);

			    // Get color from buffer
			    float4 _from_buffer = accum_buffer[new_index];

			    // Get filter value
			    float _fvalue = _filter[i + _edge][j + _edge];
			    _fsum += _fvalue;
  			  
  			    // Apply filter
			    _sum += _from_buffer * _fvalue;
		    }
  		
  	    // Normalize colors
	    _sum /= _fsum;

	    // Update color
	    val = _sum;
    }
}

output_buffer[launch_index] = make_color(make_float3(val));
accum_buffer[launch_index] = acc_val;

The error that’s invoked be like:

Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)

droettger · February 18, 2019, 4:32pm

You cannot access neighbouring launch indices inside a single launch for reading.

The scheduling is not under your control, means you don’t know when the neighbouring launch indices are containing data. In multi-GPU configurations they might not even be on the same device.

Your filter kernel must be a separate launch reading from the previously written data as input now and writing into another buffer.

Have a look into the tonemapper implementation in this example code. That is a ray generation program which is not shooting rays:
[url]https://github.com/nvpro-samples/optix_advanced_samples/blob/master/src/optixIntroduction/optixIntro_09/shaders/raygeneration.cu#L277[/url]

(This is an exception in this single example. The tonemapper in all other OptiX Introduction examples is implemented as post-process inside a GLSL shader which does the final texture blit to screen.)

While this is only reading and writing on the same launch indices, it could also implement a filter by reading multiple cells from the input buffer, named sysOutputBuffer in that kernel because that was the output from the renderer.

roman.vlk18 · February 20, 2019, 10:14am

Hi, thanks for the help! I managed to create a separate ray generation program, but it’s still not working correctly…

I created a new temp buffer called image_buffer, to which I pass the values from the tracer. The idea is to read the info from the image_buffer which is a float4, perform operations and than cast it to a uchar4 and on to the output_buffer

This is my code so far

RT_PROGRAM void raygeneration_filter()
{
    float4 _color = make_float4(0);
	
    if (!frame)
	_color = image_buffer[launch_index];
    else
    {
        int _size = 3;
	int _edge = 1;// floor(_size / 2.f);

	size_t2 screen = output_buffer.size();

	if ((launch_index.x - _edge) > 0 && (launch_index.y - _edge) > 0 &&
		((launch_index.x + _edge) < screen.x) && ((launch_index.y + _edge) < screen.y))
	{
            float _filter[3][3] = { { 1, 1, 1 },
				    { 1, 1, 1 },
				    { 1, 1, 1 }
            };

            float4 _sum = make_float4(0);
            float _fsum = 0;

            for (int i = 0; i < _size; i++)
            	for (int j = 0; j < _size; j++)
            	{
               	    float _fval = _filter[i][j];
	    	    uint2 new_index = make_uint2(launch_index.x + i - _edge, launch_index.y + j - _edge);
		    float4 from_buffer = image_buffer[new_index];

		    _color += from_buffer * _fval;
		}
	}
}

output_buffer[launch_index] = make_color(make_float3(_color));
}

And it produces the following error:

Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)

but if I change line 32 from _color += from_buffer * _fval; to _color = from_buffer * _fval; (remove the plus), it produces an image and supposedly works fine…

Why does this error occur?

droettger · February 20, 2019, 11:33am

That could be a compiler error.

What’s your system configuration?
OS version, installed GPU(s), display driver version, OptiX version (major.minor.micro), CUDA toolkit version used to compile CUDA code to PTX, host compiler version.

BTW, if that is not going to become a programmable filter size with weights inside a buffer, then it’s more efficient to unroll the two loops and simply write the nine filter kernel operations manually.

roman.vlk18 · February 20, 2019, 12:22pm

System parameters are as follows

MSI laptop, Core i7
Windows 10
NVIDIA GeForce GTX 1060
Display driver version 25.21.14.1891
CUDA 8.0
Optix 4.1.1
Visual Studio Community 2015 version 14.0.25431.01 Update 3

And thanks for the good advice but I intend to generalize the filter for different sizes

droettger · February 20, 2019, 12:41pm

OptiX 4 is not really what you should be using today.
Please try to use at least OptiX 5.1.0 and CUDA 9.0 (not 9.1 or 9.2).
Or even try updating to OptiX 6.0.0 and use CUDA 9.0 or 10.0.

roman.vlk18 · February 20, 2019, 1:10pm

I tried to install the latest version of CUDA back when it was CUDA 9, but I had compatibility issues with the GPU and MS visual studio… It resulted in a major system crash to the point Windows was constantly rebooting, so I had to format my laptop…

Can my GeForce GTX 1060 run CUDA 9 or 10?

droettger · February 20, 2019, 1:36pm

That’s a Pascal board and should have no problems with it.
The display drivers 418.91 you have installed contain CUDA 10.0 drivers already.
The CUDA_installation_Guide_Windows document for CUDA 9.0 and 10.0 list the Visual Studio Community 2015 as supported for native x86_64 development.

What you should NOT do is install any display driver components from CUDA Toolkits!
There is an immediately outdated driver inside it which you never need unless you install it on the first day it comes out. It also won’t know newer boards which spawns the same CUDA installation questions over and over again. As soon as there are official drivers supporting that CUDA version there is no real need to to use that anymore.

Use the custom installation path in the CUDA toolkit installer and disable all display driver features.
Then install just the software and documentation.
In case your NVIDIA Control Panel vanished because of a bug in the CUDA Tookit installer, install your current or the newest display driver for your configuration again afterwards.

Note that you can have all these CUDA toolkits installed side by side. I have 8.0, 9.0 and 10.0 installed and switch between them with the CUDA_PATH environment variable which lets CMake find it in my OptiX projects.
If I find that the newest one works for all my use cases, I stay with that.

Always look into the OptiX Release Notes before setting up a development environment for OptiX.

roman.vlk18 · February 21, 2019, 8:39am

I guess that’s I was unaware of. I now successfully migrated to CUDA 10 and Optix 6.0.0! Thank you :)

I recreated the project with CMake, and it compiled successfully (I suppose it also created the neccessary PTX files) but now it crashed on this

Program prg;
prg = context->createProgramFromPTXFile(ptxPath("disney.cu"), "Sample");
brdfSample[0] = prg->getId();

With the error:

Invalid context (Details: Function “_rtProgramGetId” caught exception: Validation error: _Z6SampleR17MaterialParameterR5StateR19PerRayData_radiance function with semantic type BINDLESS_CALLABLE_PROGRAM accesses the rtCurrentRay semantic variable.)

From what I can see, it’s probebly refering to this function

RT_CALLABLE_PROGRAM void Sample(MaterialParameter &mat, State &state, PerRayData_radiance &prd)
{
	float3 N = state.ffnormal;
	float3 V = -ray.direction;
	prd.origin = state.fhp;

	float3 dir;
	
	
	float probability = rnd(prd.seed);
	float diffuseRatio = 0.5f * (1.0f - mat.metallic);

	float r1 = rnd(prd.seed);
	float r2 = rnd(prd.seed);

	optix::Onb onb( N ); // basis

	if (probability < diffuseRatio) // sample diffuse
	{
		cosine_sample_hemisphere(r1, r2, dir);
		onb.inverse_transform(dir);
	}
	else
	{
		float a = max(0.001f, mat.roughness);

		float phi = r1 * 2.0f * M_PIf;
        
		float cosTheta = sqrtf((1.0f - r2) / (1.0f + (a*a-1.0f) *r2));      
		float sinTheta = sqrtf(1.0f - (cosTheta * cosTheta));
		float sinPhi = sinf(phi);
		float cosPhi = cosf(phi);

		float3 half = make_float3(sinTheta*cosPhi, sinTheta*sinPhi, cosTheta);
		onb.inverse_transform(half);

		dir = 2.0f*dot(V, half)*half - V; //reflection vector
	}
	prd.direction = dir;

I have a decleration of the rtCurrentRay at the top of the file:

rtDeclareVariable(Ray, ray, rtCurrentRay, );

What seems to be the problem?

droettger · February 21, 2019, 10:00am

The problem is exactly what the error message says. It’s not allowed to access the rtCurrentRay variable inside a callable program.
You’re doing that in line 4: float3 V = -ray.direction;
Move the ray direction into your State or PerRayData_radiance or a separate argument to the function instead and access it from there.

roman.vlk18 · February 21, 2019, 10:09am

That’s so weird… Is it something that was changed after Optix 4? Cause this .cu file was taken from Optix examples and it used to work with no errors…

Also where can I read all the changes that occurred after version 4? All I found was high level stuff like additional support for newer GPUs and AI notes

droettger · February 21, 2019, 10:32am

That it worked before was unintentional and OptiX 6.0.0 fixed it and enforced the correct behaviour.

The OptiX Release Notes are the first thing to check for new additions and changes.
There won’t be a list of all individual bug fixes and changes. The new execution strategy inside OptiX 6.0.0 changed too much over the previous version.
The OptiX API Reference lists which functions have been added in each version.
The OptiX 6.0.0 Programming Guide is lagging behind and will be updated accordingly as soon as possible.
[url]http://raytracing-docs.nvidia.com/optix_6.0/index.html[/url]
Other than that reading this forum will often explain more intricacies than handled in the documentation.

roman.vlk18 · February 21, 2019, 2:10pm

Alright, correct behavior was enforced in my code as well!

I managed to get everything working, but the original problem from comment #3 remains - except now instead of
returned (700): Illegal address , it now says returned (719): Launch failed , and if I remove the plus (just like I did in comment #3) it works fine…

What is going on?

roman.vlk18 · February 24, 2019, 11:02am

I just encountered the same problem in another place

float3 new_point = (state.fhp - prd.origin) * rnd(prd.seed);
optix::Ray _ray(new_point, -sun, 0, scene_epsilon)
rtTrace(top_object, _ray, prd);

This does work, but when I change line 1 to

float3 new_point =  prd.origin + (state.fhp - prd.origin) * rnd(prd.seed);

There’s an error: returned (719): Launch failed

droettger · February 25, 2019, 9:36am

I still guess that’s an error inside the compiler or the driver’s PTX assembler or microcode generator.

Would you be able to provide a minimal reproducer in failing state to be able to file a bug report?
You could send that via e-mail to OptiX-Help(at)nvidia.com.
Attachments with *.zip extension need to be renamed or they get blocked. *.zi_ will do.

roman.vlk18 · February 27, 2019, 11:25am

I managed to reproduce it in the optixPathTracer project provided in the Optix 6.0.0 SDK and sent it via email.

Ill keep this post updated once I get a reply for future reference

Thanks a lot for your help!

roman.vlk18 · March 10, 2019, 1:59pm

Is there any way to check what’s going on with my report? Am I going to get a response for the main or am I waiting in vain…?

droettger · March 11, 2019, 11:33am

No, that database is NVIDIA internal.
Don’t hold your breath though. Depending on which module needs to be fixed (e.g. driver, compiler, SDK) it can take months between a new bug report and a fix available in the resp. module for end customers.

Topic		Replies	Views
Porting APP from Optix 3.8 (32 bit) to Optix 6.5 (64 bit) : Need some help, please OptiX	54	2071	June 15, 2022
OptiX Error: 'Failed to load OptiX library OptiX	51	18106	June 14, 2022
Simple PTX shader - OptiX 7 OptiX	27	4130	October 12, 2021
Optix 7.5 memory access problem OptiX	24	2041	August 11, 2023
Problem with running OptiX 6.5 program. "invalid value for --gpu-architecture" OptiX	7	2997	October 12, 2021
Question about add a .cu in my project OptiX	3	704	March 5, 2024
Help me get started with OptiX (InvalidValue error when trying to run sample) OptiX	9	6119	June 14, 2022
CMake don't compile Cuda kernels OptiX cuda	6	3775	October 12, 2021
sdk example output to file OptiX	22	4499	June 14, 2022
optiXTutorial 11 - remove (free)GLUT OptiX	37	4641	June 14, 2022

Implementing image filter kernels at runtime

Related topics