Algorithm behind converting a diffuse texture to a normal texture?

I want to build a tool in Python converting rgb images into normal textures.
I know there are plugins and tools that does this but I don’t know how they do it.
Is there some documentation on what the process looks like behind the curtains?
The closest information I have gotten is that the tools find the luminance values of the pixels in the original image and uses this to scale vectors in for example tangent space.
Is there more information on the subject available?

Hi emjp1821!

The technique we use in the Texture Tools Exporter has two steps:

  1. Convert the RGBA texture to a heightmap.
  2. Compute normals by using the slope of the heightmap at each pixel.

Step 1 is the least well-defined. There are a few techniques that we currently have in the Texture Tools Exporter (there’s a bit more information on each one in the app if you hover over the Height Source checkboxes):

  • Using the alpha, red, green, or blue channel as a heightmap: these are useful if you have a texture where one channel correlates with height really well - or, in the best case, you have a texture where the RGB channels represent color and the A channel contains displacement
  • Averaging the red, green, and blue channels: this is sort of similar to computing grayscale luminance. (Technical note, it’s a bit different since computing luminance should usually be done in linear space weight the green channel more than the red and blue channels, while this weights all channels the same and is done in sRGB space – but we won’t get an exact height map anyways, so that isn’t where most of the error comes from)
  • Taking the max of the red, green and blue channels.
  • “screen blending” the red green and blue channels, using the formula height = 1 - (1-red)*(1-green)*(1-blue).

More recently, there’s been a lot of work in using machine learning to predict height maps from color information! These machine learning algorithms can also sometimes predict other material properties, such as roughness and metalness.

Step 2 is better-defined: we have a 2D image h containing the height at each pixel, and we want to compute the normal at each pixel. If we can compute the slopes along the x and y axes (i.e. the partial derivatives dh/dx and dh/dy) at each pixel, the normal is given by (n_x, n_y, n_z) = normalize(-dh/dx, -dh/dy, 1)!

We have a couple of derivative kernels the user can choose - like the 2-sample approximation dh/dx(i, j) ≈ (h(i+1,j) - h(i-1, j))/2, Sobel filters, and more. Each of these generally amounts to a 2D convolution, and there are a couple of methods for designing these kernels (see e.g. this article and books on numerical analysis). There’s a bit more detail on the kernels we use in the app’s tooltips.

Finally, we have the normal (n_x, n_y, n_z)! Since n_x and n_y can be negative, normal maps usually store the color (0.5 + 0.5 * n_x, 0.5 + 0.5 * n_y, n_z). However, there are lots of ways to improve precision and compression here, so usually at this last step you’ll need to take your engine’s desired format and compression into account! For instance, one common approach when using BC3 compression is to instead store the RGBA color (1, n_y, 0, n_x) and reconstruct the normal from the green and alpha channels in the shader. (This isn’t as necessary with BC7, which has a mode that can swap channels around for better compression). Another good resource here is Crytek’s 2010 Advanced Real-Time Rendering presentation, which mentions things like using 16-bit textures rather than 8-bit textures.

Hope this helps!

Wow thank you so much!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.