Problem with writing direct lighting to a texture (texture baking)

Hey there,

I am using OptiX to try and write my direct lighting into a UV map. Here’s how my pseudo algorithm works:

  1. I start from the area light source, dividing each light source into cells (stratified sampling) and sampling accordingly. For each light source, I have an origin in world coordinates, as well as a du (parallel to the width of the source) and a dv vector. The width and height of the light source, and the normal are also stored. The light source data is passed to the device programs via the launch parameters. I can then calculate the world positions of the points I am sampling using the origin of the light source and the du and dv vectors to calculate the offset in world space.
  2. The ray direction is randomized within the upper hemisphere that lies on the same side as the area light source normal.
  3. The ray is sent into the scene. I use a 2D vector as PRD, with the intention to save the barycentric UV coordinate of the geometry that’s hit first. In my closest hit shader I thus calculate the hit point’s UV coordinate as follows:
        const int   primID = optixGetPrimitiveIndex();
        const glm::ivec3 index = sbtData.index[primID];
        const float u = optixGetTriangleBarycentrics().x;
        const float v = optixGetTriangleBarycentrics().y;

        // Barycentric tex coords
        const glm::vec2 tc =
             (1.f - u - v) * sbtData.texcoord[index.x]
            + u * sbtData.texcoord[index.y]
            + v * sbtData.texcoord[index.z];

I then simply write it to my PRD.

  1. Then at the end, in my raygen program, I take the UV coordinate in the PRD, and use it to calculate my pixel index into the color buffer (which is a uint32_t*): const uint32_t uvIndex = int(rayTexCoordPRD.x * optixLaunchParams.directLightingTexture.size) + int((rayTexCoordPRD.y * optixLaunchParams.directLightingTexture.size) * optixLaunchParams.directLightingTexture.size);. I let the light contribute to that pixel by adding a weighted gray value (each channel has the same contribution).

In my host program, I download the color buffer from the GPU, and try to write it to an image using stb_image_write, but the output remains a fully black image with some fairly random colored looking pixels in the first few rows. I tried to write hardcoded values into hardcoded indexes, to see how that would change the result, but the result remains exactly the same:

I think I am overseeing something in the buffer management or optixLaunch’s parameters. Here is my launch call:

        // Launch direct lighting pipeline
        OPTIX_CHECK(optixLaunch(
            directLightPipeline->pipeline, stream,
            directLightPipeline->launchParamsBuffer.d_pointer(),
            directLightPipeline->launchParamsBuffer.sizeInBytes,
            &directLightPipeline->sbt,
            scene.amountLights(),   // dimension X: the light we are currently sampling
            STRATIFIED_X_SIZE,      // dimension Y: the amount of cells of our stratified sample grid cell in the X direction (on the light)
            STRATIFIED_Y_SIZE       // dimension Z: the amount of cells of our stratified sample grid cell in the Y direction (on the light)
            // dimension X * dimension Y * dimension Z CUDA threads will be spawned 
        ));

A device pointer to the color buffer itself is passed to the device program via the launch parameters. I first allocate memory for the buffer, namely textureSize * textureSize * sizeof(uint32_t) bytes. I pretty much did the same steps for this buffer as I did for the color buffer from Ingo Wald’s OptiX course, which traces rays from a camera POV. The only thing that seems different to me here is that my launch size is not necessarily equal to the size of my light baked texture (I launched amountLights * stratify_X_resolution * stratify_Y_resolution threads). Is there anything else I might be overseeing?

UPDATE: I realised that I forgot to initialize the values in the color buffer (in Ingo Wald’s example this is not necessary, since they are surely overwritten anyways). That explains the random colors in the first few rows. However, I now initialized my color buffer with zeroes, hardcoded in the device program to write value 255 to the first 2 rows, but my resulting images stay fully black. It seems like the overwriting is not happening. I also do a cuDeviceSynchronize() before downloading the pixels, to make sure the GPU is done rendering. Anything else that can explain this behaviour?

First of all, starting from the light sources to scatter light onto some surface is backwards from how you’d normally bake light into a texture. That is normally done as a gather algorithm by starting from the surface which holds the texture and then integrates all incoming light over the hemisphere per texel.

What you implemented is a scatter algorithm and the contributions to the destination indices must be accumulated with atomic operations because different rays from different lights might hit the same destination index. Also when doing that, many light rays might simply miss the surface (unless this is inside a closed mesh),
You described a brute force light tracer which is rather inefficient for this light baking purpose.

You didn’t say how big your 3D launch dimension (scene.amountLights(), STRATIFIED_X_SIZE, STRATIFIED_Y_SIZE) is.
The launch dimensions in OptiX are limited to 2^30.
I would find it more natural to put the (x, y) sizes into the (x, y) dimensions and the light count into the z component, if that is required at all.
You could also loop over the light count inside the raygeneration program instead of using 3D launches which scatter on a 2D surface anyway. Or bake each light individually in a separate optixLaunch.

The optixLaunch is asynchronous, like all OptiX API functions taking a CUDA stream argument.
If you copy the data from device to host with a non-asynchronous memcpy, that should have synchronized automatically. Though you could just add a synchronization call between the optixLaunch and the memcpy for debugging and see if anything changes.

If not even writing explicit values to the output buffer is working, there is either something wrong with the addressing, or the copy from device to host, or with the writing of the image.
The data inside the host buffer after the device to host copy could be verified inside the debugger by looking at a memory window pointing to the host buffer.

Everything else would need a little more information about how you allocated the buffer, set it inside the launch parameter block, how you read it from device to host, etc.

But as said in the beginning, your whole approach is backwards and should be implemented as a gather algorithm starting from the textured surface instead by integrating the incoming light.

1 Like

Thanks for the broad explanation and insights! First, to come back on the issue with the writing to the buffer having no impact on the resulting image: I was writing hardcoded integer values (e.g. 255), however, as the example of Ingo Wald proposes, we still need to convert it into an 32-bit rgba value before writing:

                // convert to 32-bit rgba value (we explicitly set alpha to 0xff
                // to make stb_image_write happy ...
                const uint32_t rgba = 0xff000000
                    | (r << 0) | (g << 8) | (b << 16);

This solves the issue.

Considering the scatter vs. gather approach, the issue that you pointed out regarding the light rays that might miss the surface, aren’t we dealing with the same problem in the gathering approach? I.e. isn’t it possible that the rays that leave the UV map, mapped to world space, miss the light source? Isn’t this the reason why importance sampling is used to minimize this occurence? About making writing to the buffer an atomic operation, do you know any example that implements this? I also assume this will be the main inefficiency then?

Thanks in advance,
Chuppa

Considering the scatter vs. gather approach, the issue that you pointed out regarding the light rays that might miss the surface, aren’t we dealing with the same problem in the gathering approach?
I.e. isn’t it possible that the rays that leave the UV map, mapped to world space, miss the light source?

Depends on how you’re implementing this.

Light baking is normally only gathering the diffuse contribution on a surface because that part is view independent.
You said you wanted to write direct lighting to a texture. In that case the lights would be explicitly sampled and would only be shadowed if something is obstructing the view to the light sample point.
Means you can exactly calculate how big the solid angle of the light source’s area projected onto that hemisphere is and scale the light samples accordingly.
Any other directions shouldn’t contribute any lighting because they wouldn’t be direct lighting.

If you want to bake global illumination, then you would need to sample the other directions of that distribution function (Lambert: cosine weighted hemisphere) as well. That’s basically a global illumination path tracer which starts primary rays on surface points.

About making writing to the buffer an atomic operation, do you know any example that implements this?
I also assume this will be the main inefficiency then?

Atomics aren’t really slow if they aren’t blocked and there is no way around them in a scatter algorithm.
Here’s an example with floats: https://forums.developer.nvidia.com/t/best-strategy-for-splatting-image-for-bidir/111000/2

More information can be found inside the CUDA Programming Guide:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions

You might want to read this thread about light baking and follow some of the links in there:
https://forums.developer.nvidia.com/t/baking-to-texture/57699
(Ignore the old OptiX API comments in that. They don’t apply to OptiX 7.)

The forum has a search field in the top right. I found these links by searching for “atomic” and “baking”.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.