Implementing image filter kernels at runtime

I’m just making my way through image kernels and I was wondering if there’s a way to apply a post processing filter at runtime?

I realize the filtered image would need to be saved in a temporary buffer before swapped onto the final image, but I’m quite confused as to how to implement it… Is there any way I can check if all threads have finished so I have a clear image to work with?

Regardless, I’ve written the following as an implementation, but if gives me an error (obviously).
I’ve added the error after the code segment

Code runs after the loop

 . . .
 . . .

and after the tone mapping

if (frame > 1)
    // Filter size and edge distance
    int _size = 3;
    int _edge = floor(_size / 2.f);
    size_t2 screen = output_buffer.size();
    // Make sure I'm inside the frame
    if ((launch_index.x - _edge) > 0 && (launch_index.y - _edge) > 0 &&
	    ((launch_index.x + _edge) < screen.x) && ((launch_index.y + _edge) < screen.y))

    	    // Define a Gaussian filter
	    float _filter[3][3] = {
	    				{0, 1, 0},
					{1, 2, 1},
					{0, 1, 0}

  	    // Initialize convoluted and filter sums
	    float4 _sum = make_float4(0);
	    float _fsum = 0;
	    for (int i = -_edge; i < _edge + 1; i++)
		    for (int j = -_edge; j < _edge + 1; j++)
		    	    // Get new index with offset
			    uint2 new_index = make_uint2(launch_index.x + i, launch_index.y + j);

			    // Get color from buffer
			    float4 _from_buffer = accum_buffer[new_index];

			    // Get filter value
			    float _fvalue = _filter[i + _edge][j + _edge];
			    _fsum += _fvalue;
  			    // Apply filter
			    _sum += _from_buffer * _fvalue;
  	    // Normalize colors
	    _sum /= _fsum;

	    // Update color
	    val = _sum;

output_buffer[launch_index] = make_color(make_float3(val));
accum_buffer[launch_index] = acc_val;

The error that’s invoked be like:

Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)

You cannot access neighbouring launch indices inside a single launch for reading.

The scheduling is not under your control, means you don’t know when the neighbouring launch indices are containing data. In multi-GPU configurations they might not even be on the same device.

Your filter kernel must be a separate launch reading from the previously written data as input now and writing into another buffer.

Have a look into the tonemapper implementation in this example code. That is a ray generation program which is not shooting rays:

(This is an exception in this single example. The tonemapper in all other OptiX Introduction examples is implemented as post-process inside a GLSL shader which does the final texture blit to screen.)

While this is only reading and writing on the same launch indices, it could also implement a filter by reading multiple cells from the input buffer, named sysOutputBuffer in that kernel because that was the output from the renderer.

Hi, thanks for the help! I managed to create a separate ray generation program, but it’s still not working correctly…

I created a new temp buffer called image_buffer, to which I pass the values from the tracer. The idea is to read the info from the image_buffer which is a float4, perform operations and than cast it to a uchar4 and on to the output_buffer

This is my code so far

RT_PROGRAM void raygeneration_filter()
    float4 _color = make_float4(0);
    if (!frame)
	_color = image_buffer[launch_index];
        int _size = 3;
	int _edge = 1;// floor(_size / 2.f);

	size_t2 screen = output_buffer.size();

	if ((launch_index.x - _edge) > 0 && (launch_index.y - _edge) > 0 &&
		((launch_index.x + _edge) < screen.x) && ((launch_index.y + _edge) < screen.y))
            float _filter[3][3] = { { 1, 1, 1 },
				    { 1, 1, 1 },
				    { 1, 1, 1 }

            float4 _sum = make_float4(0);
            float _fsum = 0;

            for (int i = 0; i < _size; i++)
            	for (int j = 0; j < _size; j++)
               	    float _fval = _filter[i][j];
	    	    uint2 new_index = make_uint2(launch_index.x + i - _edge, launch_index.y + j - _edge);
		    float4 from_buffer = image_buffer[new_index];

		    _color += from_buffer * _fval;

output_buffer[launch_index] = make_color(make_float3(_color));

And it produces the following error:

Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)

but if I change line 32 from _color += from_buffer * _fval; to _color = from_buffer * _fval; (remove the plus), it produces an image and supposedly works fine…

Why does this error occur?

That could be a compiler error.

What’s your system configuration?
OS version, installed GPU(s), display driver version, OptiX version (major.minor.micro), CUDA toolkit version used to compile CUDA code to PTX, host compiler version.

BTW, if that is not going to become a programmable filter size with weights inside a buffer, then it’s more efficient to unroll the two loops and simply write the nine filter kernel operations manually.

System parameters are as follows

MSI laptop, Core i7
Windows 10
NVIDIA GeForce GTX 1060
Display driver version
CUDA 8.0
Optix 4.1.1
Visual Studio Community 2015 version 14.0.25431.01 Update 3

And thanks for the good advice but I intend to generalize the filter for different sizes

OptiX 4 is not really what you should be using today.
Please try to use at least OptiX 5.1.0 and CUDA 9.0 (not 9.1 or 9.2).
Or even try updating to OptiX 6.0.0 and use CUDA 9.0 or 10.0.

I tried to install the latest version of CUDA back when it was CUDA 9, but I had compatibility issues with the GPU and MS visual studio… It resulted in a major system crash to the point Windows was constantly rebooting, so I had to format my laptop…

Can my GeForce GTX 1060 run CUDA 9 or 10?

That’s a Pascal board and should have no problems with it.
The display drivers 418.91 you have installed contain CUDA 10.0 drivers already.
The CUDA_installation_Guide_Windows document for CUDA 9.0 and 10.0 list the Visual Studio Community 2015 as supported for native x86_64 development.

What you should NOT do is install any display driver components from CUDA Toolkits!
There is an immediately outdated driver inside it which you never need unless you install it on the first day it comes out. It also won’t know newer boards which spawns the same CUDA installation questions over and over again. As soon as there are official drivers supporting that CUDA version there is no real need to to use that anymore.

Use the custom installation path in the CUDA toolkit installer and disable all display driver features.
Then install just the software and documentation.
In case your NVIDIA Control Panel vanished because of a bug in the CUDA Tookit installer, install your current or the newest display driver for your configuration again afterwards.

Note that you can have all these CUDA toolkits installed side by side. I have 8.0, 9.0 and 10.0 installed and switch between them with the CUDA_PATH environment variable which lets CMake find it in my OptiX projects.
If I find that the newest one works for all my use cases, I stay with that.

Always look into the OptiX Release Notes before setting up a development environment for OptiX.

I guess that’s I was unaware of. I now successfully migrated to CUDA 10 and Optix 6.0.0! Thank you :)

I recreated the project with CMake, and it compiled successfully (I suppose it also created the neccessary PTX files) but now it crashed on this

Program prg;
prg = context->createProgramFromPTXFile(ptxPath(""), "Sample");
brdfSample[0] = prg->getId();

With the error:

Invalid context (Details: Function “_rtProgramGetId” caught exception: Validation error: _Z6SampleR17MaterialParameterR5StateR19PerRayData_radiance function with semantic type BINDLESS_CALLABLE_PROGRAM accesses the rtCurrentRay semantic variable.)

From what I can see, it’s probebly refering to this function

RT_CALLABLE_PROGRAM void Sample(MaterialParameter &mat, State &state, PerRayData_radiance &prd)
	float3 N = state.ffnormal;
	float3 V = -ray.direction;
	prd.origin = state.fhp;

	float3 dir;
	float probability = rnd(prd.seed);
	float diffuseRatio = 0.5f * (1.0f - mat.metallic);

	float r1 = rnd(prd.seed);
	float r2 = rnd(prd.seed);

	optix::Onb onb( N ); // basis

	if (probability < diffuseRatio) // sample diffuse
		cosine_sample_hemisphere(r1, r2, dir);
		float a = max(0.001f, mat.roughness);

		float phi = r1 * 2.0f * M_PIf;
		float cosTheta = sqrtf((1.0f - r2) / (1.0f + (a*a-1.0f) *r2));      
		float sinTheta = sqrtf(1.0f - (cosTheta * cosTheta));
		float sinPhi = sinf(phi);
		float cosPhi = cosf(phi);

		float3 half = make_float3(sinTheta*cosPhi, sinTheta*sinPhi, cosTheta);

		dir = 2.0f*dot(V, half)*half - V; //reflection vector
	prd.direction = dir;

I have a decleration of the rtCurrentRay at the top of the file:

rtDeclareVariable(Ray, ray, rtCurrentRay, );

What seems to be the problem?

The problem is exactly what the error message says. It’s not allowed to access the rtCurrentRay variable inside a callable program.
You’re doing that in line 4: float3 V = -ray.direction;
Move the ray direction into your State or PerRayData_radiance or a separate argument to the function instead and access it from there.

That’s so weird… Is it something that was changed after Optix 4? Cause this .cu file was taken from Optix examples and it used to work with no errors…

Also where can I read all the changes that occurred after version 4? All I found was high level stuff like additional support for newer GPUs and AI notes

That it worked before was unintentional and OptiX 6.0.0 fixed it and enforced the correct behaviour.

The OptiX Release Notes are the first thing to check for new additions and changes.
There won’t be a list of all individual bug fixes and changes. The new execution strategy inside OptiX 6.0.0 changed too much over the previous version.
The OptiX API Reference lists which functions have been added in each version.
The OptiX 6.0.0 Programming Guide is lagging behind and will be updated accordingly as soon as possible.
Other than that reading this forum will often explain more intricacies than handled in the documentation.

Alright, correct behavior was enforced in my code as well!

I managed to get everything working, but the original problem from comment #3 remains - except now instead of
returned (700): Illegal address , it now says returned (719): Launch failed , and if I remove the plus (just like I did in comment #3) it works fine…

What is going on?

I just encountered the same problem in another place

float3 new_point = (state.fhp - prd.origin) * rnd(prd.seed);
optix::Ray _ray(new_point, -sun, 0, scene_epsilon)
rtTrace(top_object, _ray, prd);

This does work, but when I change line 1 to

float3 new_point =  prd.origin + (state.fhp - prd.origin) * rnd(prd.seed);

There’s an error: returned (719): Launch failed

I still guess that’s an error inside the compiler or the driver’s PTX assembler or microcode generator.

Would you be able to provide a minimal reproducer in failing state to be able to file a bug report?
You could send that via e-mail to OptiX-Help(at)
Attachments with *.zip extension need to be renamed or they get blocked. *.zi_ will do.

I managed to reproduce it in the optixPathTracer project provided in the Optix 6.0.0 SDK and sent it via email.

Ill keep this post updated once I get a reply for future reference

Thanks a lot for your help!

Is there any way to check what’s going on with my report? Am I going to get a response for the main or am I waiting in vain…?

No, that database is NVIDIA internal.
Don’t hold your breath though. Depending on which module needs to be fixed (e.g. driver, compiler, SDK) it can take months between a new bug report and a fix available in the resp. module for end customers.