Hi,
in order to get SceneNet-RGBD https://robotvault.bitbucket.io/scenenet-rgbd.html to work with my RTX3090, I have changed the projects default Optix version 4.1.1 to 6.5.0 . Now my renderings are showing weird artefacts and I would like to ask you for ideas where I can start to debug this problem.
As you can see below, at the beginning the render looks normal but than these artefacts are kind of accumulating over time.
On my other machine with GTX1060 and Optix 4.1.1 everything is working well.
My system configuration is:
CUDA: 10.1
Driver: 465.19.01
OS: Ubuntu 20.04.2 (Kernel 5.8.0-50-generic)
Thanks in advance!
Such things are normally happening when the program throws exceptions inside the OptiX device code and has an exception program which writes some debug color.
There are three exception programs in the code, one of them writes a color.
One of them uses a bad_color
variable which is never set. That’s a bug if this is used.
The other is declaring an exceptionErrorColor
which is never used.
But all of them are printing the exception code.
Means enable all exceptions and the print functionality in OptiX and see if there are any exceptions raised.
If yes, isolate one to a single pixel and try to see where it comes from.
Actually two of the exception programs are using CUDA native printf calls, no need to enable the print functionality in OptiX for those, but limiting the prints to a specific pixel will need manual code changes then.
Magenta would actually be a nice color for debugging because that doesn’t appear often in the real world.
OptiX 6.5.0 changed the API for setting the stack size! It’s not done with bytes but with recursion count.
https://raytracing-docs.nvidia.com/optix6/guide_6_5/index.html#host#3129
Check if there are any stack overflow exception issues and if yes, adjust the OptiX stack size.
Well, the application isn’t even setting any stack size, so that is bad as well and needs to be fixed.
Another good debugging methodology is to add checks for INF, NAN, and negative radiance values into the ray generation program before it writes them to the output buffer.
Example code here: https://github.com/nvpro-samples/optix_advanced_samples/blob/master/src/optixIntroduction/optixIntro_07/shaders/raygeneration.cu#L168
Hard to say what the garbled geometry artefacts are.
Could be an incorrect algorithm counting things inside the anyhit program while the BVH has been built with splitting primitives so that a single primitive can invoke the anyhit program multiple times.
Looking through the code, the anyHitRadiance()
program in scenenetrgb-d\renderer\src\Renderer\PhotonMapping\VolumetricPhotonSphereRadiance.cu
does exactly that!
The code is setting geometry_group->setAcceleration(context->createAcceleration("Sbvh", "Bvh"));
and Sbvh is a Splitting-BVH, so this is generally wrong in the application.
Try setting that to Bvh if any exception errors have been solved.
1 Like
@droettger Thank you so much for your extensive answer.
I definitely do have a problem with the stack size. When my resolution is 320 x 240, the problems are gone. If I am increasing it to 640 x 480, the rendering issues immediately return, but I can fix it if I am increasing stack size by a factor of 4 to 12000. As you have already mentioned, there is a new way in version 6.5 of setting the stack size correctly, but I am still studying the documentation and trying to understand the old and the new way.
As you suggested, I have tried to enable exception printing. Reading docs, I stumbled over this and added it to the code
m_context->setPrintEnabled( true );
m_context->setPrintBufferSize( 4096 );
As well as:
m_context["bad_color"]->setFloat( 1000000.0f, 0.0f, 1000000.0f );
m_context["exceptionErrorColor"]->setFloat( 1000000.0f, 0.0f, 1000000.0f );
Your Advise to change from “Sbvh” to “Bvh” seems to fix another issue. (Impressive that you saw it instantly)
geometry_group->setAcceleration(context->createAcceleration("Bvh", "Bvh"));
optix::Group gro = context->createGroup();
gro->setChildCount(1);
gro->setChild(0, geometry_group);
optix::Acceleration acceleration = context->createAcceleration("Sbvh", "Bvh");
gro->setAcceleration(acceleration);
Now I can see another issue with light artefacts in my renderings. I did not manage to investigate into that problem yet.
EDIT:
So far I do not see any exceptions, but I am also not sure, if I have enabled all possible debug options in OptiX.
Sorry, I did not understand the stack size correctly. Even if I thought I did fix the issue by increasing context->setStackSize(), actually I did not change anything.
So first of all, I have now set the stack size as described in the docs:
m_context->setMaxTraceDepth(20);
m_context->setMaxCallableProgramDepth(20);
I will investigate, how far I can reduce those numbers.
Secondly, I have removed the cuda “-arch=compute_61 -code=sm_61” flag from the CMakeLists.txt. I totally forgot about this.
Now all my issues seem to be solved. It looks grainier than before, but I guess I just have to fiddle a bit with the settings.
@droettger Thanks a lot!