[SOLVED] Exception after rtContextLaunch2D failure

EDIT: this post oringinally was called “Shadow Issue After Denoising render result of Photon Mapper Sample”
this part is SOLVED. (usage of “appendLaunch” was wrong in my code, see post #7 of this thread)
now thread renamed, cause a related exception still is present
SOLVED: using one TOP GROUP instead of a group hierarchy solved it! Thanks Detlef.

I tried 3 different settings:

Test 1. adding the Denoiser+ToneMapper stages (from Denoiser Sample in OptiX 5.0.0 SDK) to Photon Mapper of
OptiX Advaned Samples. (see result in picture1.jpg)
As you see a Gamma issue occurs and no visibile shadows are present;
On using sutil::displayBufferGL(getTonemappedBuffer(), BUFFER_PIXEL_FORMAT_DEFAULT, true);
that even got more worse: see picture2.jpg

Test 2. pure Denoiser (see picture3.jpg) I simply removed the tone mapper stage and used

   (ignoring the 2.2 Gamma issue with the training data "6.4.1 Deep-learning based Denoiser" in doc)
   => color seems to be ok, but shadows are still removed.

Test 3. a) tone maapping with Gamma 2.2 (as in original Denoiser sample)
b) denoiser (as in original Denoiser sample)
c) second tone mapping with Gamme 1.0 / 2.2

    => nearly same result as test 2; a very slight shadow present  (picture4.jpg)

I tried out 0.0, 0.5 and 1.0 for Variable(denoiserStage->queryVariable(“blend”))->setFloat(denoiseBlend);
Always Exposure = 1.0
But no difference.

Always the Photon Mapper finishes before Denoising starts. I use:

int PhotonMappingFrames = 5; 
isEarlyFrame = (accumulation_frame <= (unsigned int)(numNonDenoisedFrames+PhotonMappingFrames) );
skipDenoising = (accumulation_frame <= (unsigned int)(PhotonMappingFrames));

in all cases the output is smooth, but all the shadows disappeared nearly completely.
Is there an option to configure the Denoiser to avoid this?
In the docs this is not mentioned as limitation. is it a limitation?

Would you be able to provide the full changes to the optixProgressivePhotonMap.cpp file instead of just the code excerpts above?
Then I could drop that into the advanced OptiX sample locally and try to reproduce this.
That would speed up the turnaround time and avoid code differences when guessing about the rest of the necessary changes.

Please always provide the system configuration when reporting issues:
OS version, installed GPU(s), display driver version, OptiX version, CUDA toolkit version used to produce the input PTX code.

thank you for your answer, Detlef.

attachment removed

my current system info:

Device: GTX 1050 Driver: 388.71
OptiX 5.0 with CUDA Toolkit 9.1.85 on Visual Studio 2017 Community 15.5.2 (toolset v140 of VS2015)
on Windows10 PRO 64bit (still Anniversary Update version 1607 build 14393.1593)
Win SDK Target Platform 10.0.15063.0 (Creators Update SDK) however, same on 10.0.14393.0

OptiX 5.0.0 is installed in C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0
CUDA 9.1.85 is installed in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1
Windows 10 Kits are installed in C:\Program Files (x86)\Windows Kits\10\Include\10.0.15063.0
C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0
C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0

Thanks for providing the complete project. I downloaded and removed it from the post.

In the future, if you need things to stay confidential, you can either attach it to a private message, look into the OptiX release notes for the OptiX-Help e-mail address if attachments are smaller than 10 MB, or for bigger data ask for a temporary FTP account I can setup and send to your registered e-mail address.

Unfortunately your project setup is a little too hardcoded to work with on different systems.
We’d really need a minimal reproducer which is neither limited to a specific hard-drive location, Visual Studio version, or even operating system.

That’s why I asked if you’d be able to reproduce the problem by adding just your post-processing code the same way into the the original progressive photon mapper sources because the CMake based solutions there allow to work with other system setups (including Linux).

It might simply be that the post-processing is not working on the photon mapper nicely because that produces low frequency noise while the denoiser was trained on path tracing images which contain high frequency noise. That’s why I’d like to keep the necessary reproduction effort low.

I managed to create such a modified version of the optixProgressivePhotonMap.cpp file for you.
additional to that file the include file “Denoising.h” must be present in the original “Progressive Photon Mapping OptiX Advanced Sample”.

Thank you very much!

Next time I will use that email address for submitting projects. Thank you.

However, here 2 issues which do not affect the denoiser/tonemapper, but I wondered about them during testing the samples:

  1. a very strange exception see:
    TestApp0.zip\TestApp0\bin\Debug\CU\optixMeshViewer\pinhole_camera.cu line 150
    Expressions containing a var of type “rtIntersectionDistance” in a ray-gen program can cause an OS freeze/crash dependend on an not-closeable process:
    OptiX Error: ‘Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)’

  1. I just wondered about that in the “Photon Mapper” sample:

in original file optixProgressivePhotonMap.cpp in line 243:
context->setMissProgram( rtpass, context->createProgramFromPTXFile( ptx_path, “rtpass_miss” ) );

“rtpass” is an “entry_point_index” (enum “ProgramEnum”)
BUT: function void setMissProgram(unsigned int ray_type_index, Program program); (in file optixpp_namespace.h in optixu sub folder: C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0\include\optixu) requests a “ray_type_index”
rtpass is no “ray_type_index”; however its 0 anyway. So within this sample it makes no difference whether its an “entry point index” or a “ray type index”. -

in optix_host.h:
[…] rtContextSetMissProgram sets a context’s miss program associated with ray type.[…]
ray_type_index : The ray type the program will be associated with[…]

So using rtpass as “ray_type_index” is not 100% logical, isn’t it?

I’ve pruned your changes down to get an isolated implementation of post-processing stages applied to the photon mapper example explicitly.

The two main problems in your code were:
1.) You must not change the original number of entry points and ray types for the photon mapper algorithm.
2.) The post-processing CommandLists were setup with the incorrect entry point for the photon mapper.
Means in the command list setup the append() of OptiX launches need to be done with the “gather” entry point (== 2) which does the photon map rendering.

Please note that resizing is broken with those changes and fails with an illegal address error. Possibly because of the hardcoded width and height defines. I didn’t look into that.

You’re right, the miss program is per ray type and the proper index in the photon mapper example code should have been “rtpass_ray_type” (== 0). It happened to work because the entrypoint index “rtpass” was 0 as well.

If you have generic CUDA launch issues, the first step is to check if that happens with newer drivers as well, then follow recommendations in my earlier forum posts where I described debugging by using the exception program and enabled exceptions, rtThrow with user defined codes, and then rtPrintf to see if anything can be detected by OptiX already before CUDA reports a failure.
Denoising.h (17.5 KB)
optixProgressivePhotonMap.cpp (41.8 KB)

Detlef, thank you very much!
Great answer. Now it works!


Let’s try some dry analysis based in the attached images alone.

My guess is that the washed out result of the minimal code example inside the tonemapper is most likely just gamma related.
Either the tonemapper doesn’t need to apply the gamma of 2.2 or the display should or should not be done with sRGB enabled.
Experiments to isolate that would be to keep the exposure at 1.0 and test what effect different gamma values have while checking what the final display routines do.

The frayed edges in the CombinedRenderRayTracerAndPhotonMapping.jpg look like depth fighting of intersecting surfaces to me. Hard to tell from a still image.
If that’s the case, that isn’t noise from a Monte Carlo sampling but a geometric artifact which would also be present in your final rendering and nothing the denoiser is meant to handle.
The necessary changes would need to happen to the scene geometry instead until that depth fighting is resolved.
Also check the scene_epsilon which is set differently among the individual OptiX Advanced Samples. It’s scene size dependent. It’s meant to prevent self intersections esp. for shadow testing rays. Set it to the smallest possible value which does not show shadow artifacts from self intersections anymore.

Mind that no notification gets sent by the forum when you edit posts. That only happens when posting something new.




finally I found a somewhat similar result on denoising the photon mapper

but what I am doing wrong on launching additional kernels?
buffer handling? pay load conflicts?
I want to read/write from/to several buffers for my depth handling.

Sorry, there are too many hardcoded assumptions inside your test program code again. There are absolute paths to OptiX and CUDA SDK and some more. For example, the *.cu files didn’t load from the executable working directory locations at runtime with the given getCuStringFromFile() edits and your instructions. Accumulation never progresses. I only get a black window after adjusting the hardcoded assumptions.

I cannot exclude potential bugs inside the denoiser implementation, but I won’t be able to look at any of your projects if they are not minimal and complete reproducers for just the failing case which live inside the OptiX SDK samples or OptiX Advanced Samples frameworks by using the same CMakeList.txt mechanism there.

The other option to provide a reproducing test case would be an OptiX API Capture (OAC).
Please see this post for instructions how to produce that:
Again, the minimal failing case is all we need. The smaller the better.