[bugreport] OptiX AI Denoiser Normal Input makes no sense

The Programming Guide for the AI Denoiser
https://raytracing-docs.nvidia.com/optix7/guide/index.html#ai_denoiser#using-the-denoiser

States the following:

Two different terms are being used “camera space” and “screen space” which I assume both mean “view space” (there’s not really much use having projection space normals, right?)

Still one would think that it would be common knowledge that one cannot drop the Z-component of a normal, even in view space, because thanks to perspective cameras we can see normals that have a positive Z component (pointing towards camera forward vector).

NOTE: […]Normal buffer inputs to the denoiser are still not supported since OptiX 5.1.0. The OptiX 7 API documentation about the denoiser is unfortunately not mentioning this. It’s only listed inside the OptiX 7 release notes.[…]
see: https://devtalk.nvidia.com/default/topic/1063569/optix/building-the-ai-denoiser-dll-from-source/post/5387265/#5387265

Hi devsh,

You’re right that we should define those terms, and that it may technically be possible to see normals with ambiguous Z sign, depending on your choice of projection.

The names of various spaces are overloaded, so view space does technically work, if you’re talking about OpenGL’s definition. But we don’t want to use “view space” for OptiX, because OptiX is not a raster engine and the camera we’re talking about is not necessarily a linear-perspective transform. You can use any camera projection you want, linear or non-linear, so “camera space” is the more appropriate choice here, and it happens to be common in ray tracing circles, e.g. https://pharr.org/matt/blog/2018/03/02/rendering-in-camera-space.html

Screen space is generally referring to a 2d space, so it doesn’t usually mean the same thing as either camera space or view space, but the OptiX documentation authors may be referring to camera space without the Z coordinate. I will request this section be reviewed.

For the denoiser, the important part is that the normals you use roughly match the training set, which means that you should transform your world space normals so that Z is along the view direction, X is right, and Y is up. If you follow that loosely, the denoiser with a normal buffer should work for you.

As far as whether we can or cannot drop the Z component of the normal, indeed the sign ambiguity is known, it just doesn’t matter. These normals are not being used for shading, they are being used in a black-box neural network. Intuitively, you can see that the normals that point along the camera forward vector are all near the edge of an object, so are not very likely to affect the denoiser’s choices. The broader answer is that as long as the processing of normal data used for inference matches the processing of normal data used for training, the neural network will learn to use whatever data it has, and there are no constraints on what channels we use or drop, and no notion of correct or incorrect.

That said, if you’re denoising and worried about having pixels with +Z normals, you could certainly render some ray-space normals where the -Z direction always points directly at the center of projection, and that way you’d never get normals with any +Z component. I’m pretty sure that would still result in roughly the same denoiser output as camera space normals.


David.

That’s possible and happens in every single kind of perspective projection, only ortho and telelens would not suffer (because they are 100% planar).

So the denoiser was trained using ray-space with -Z being the forward direction of the ray? Or was it camera space?

I’m just trying to make stuff match up.

I’d presume even though its a black box, the normal controls which pixels “connect” for the smearing, so surfaces with viewspace Normal Z>0 would get grouped together with viewspace Normal Z<0 when the magnitude of (length(Z)) is similar.

That information is outdated. The denoiser ships with the display driver and current R440 drivers are supporting normals.
I’d recommend using the 442.50 driver under Windows which also produces better denoising results due to updated training data.

1 Like

The denoiser was trained with camera space normals.


David.