Payload exception - Illegal address

FIXED ALREADY!

The Problem:
I currently updated our rendering engine from OpenSceneGraph 3.2.3 to 3.6.3 and also wanted to update our OptiX 5.0.1 + CUDA 9.0.176.4 rendering part to OptiX 6 and CUDA 10. I had some problems displaying the output_buffer with interop after the update but managed to do so in the end - our engine was working based on a bug that was fixed with the new version. Now I have an even stranger problem as before. Raytracing works and I can see the whole scenery with moving cars but the raytraced results are wrong.

What is wrong:

  1. The payload. After having problems with the raytraced results, I reduced the OptiX code to a bare minimum and removed all except 1 float from the payload. This leads to an Optix exception (crash) with the following log:

caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)

This payload causes the crash:

#ifndef PAYLOAD_H
#define PAYLOAD_H

struct PayloadTest
{
    float intensity;
};

#endif // PAYLOAD_H

Changing to bigger payload fixes the issue!

#ifndef PAYLOAD_H
#define PAYLOAD_H

class Test
{
    float intensity;
    float intensity2;
    float intensity3;
    float intensity4;
};

struct PayloadTest
{
    float intensity;
    float intensity2;
    float intensity3;
    float intensity4;
    float intensity5;
    float intensity6;
    float intensity7;
    float intensity8;
    
    Test test;
};

#endif // PAYLOAD_H

I can remember reading that the payload is stored in the register as long as it’s small enough. Otherwise, it will create a buffer for the necessary data. Why the small version crashes though is a mystery to me right now.

  1. The results in the payload.
    After switching to the bigger payload, I analyzed the results inside.
RT_PROGRAM void closestHit()
{
    prd.intensity = 1.0f;
}

RT_PROGRAM void anyHit()
{
    float4 texColor = tex2D(texUnit0Sampler, texcoord0.x, texcoord0.y);
    if (texColor.w <= 0.0f)
    {
        rtIgnoreIntersection();
    }
}

RT_PROGRAM void miss()
{
    prd.intensity = 0.0f;
}
RT_PROGRAM void optiXLidarCamera()
{
    float3 rayDirection      = vigGetRayDirectionByDistributionBuffer();

    optix::Ray ray;

    const int    ray_type           = 0;
    const float3 ray_origin         = eye;
    const float  maxTravelDistance  = 100.0f;

    ray = optix::make_Ray(ray_origin, rayDirection, ray_type, scene_epsilon, maxTravelDistance);

    PayloadTest prd;
    prd.intensity                = 0.0f;

rtTrace(top_object, ray, prd);

    output_buffer[launch_index] = make_float3(prd.intensity,
                                              0.0,
                                              0.0);
    
    rtPrintf( "Ray at launch index (%d,%d): intensity %f\n", launch_index.x, launch_index.y, prd.intensity);
}

The above code produces a black/red image with the sky being black and everything else completely red. I can see buildings and vehicles, the camera is correctly attached to the ego vehicle on the correct place. What is wrong are the values. If I write the t_hit value instead of 1.0f in the closest hit program, I get values of 30.000-40.000m. Setting the max distance to 100m does work since the hit objects are only ~100m away from the camera, but the results in t_hit are in thousands, which is also why a normalization by 100 before writing to output_buffer does not work correctly - I always get a 1.0f, 0.0f, 0.0f red or 0.0f, 0.0f, 0.0f black image, and I never get a fading red as you would expect in a depth map. I do not get any exceptions from OptiX (defined the exception program of course).

All this happens in OptiX 5 and OptiX 6, doesn’t matter which OptiX/CUDA version I take. I would be really glad if someone has an idea what I’m doing wrong since the raytracing works apparently, but has a strange behavior with the payload and the raytraced values. Important to note that I do disable RTX mode before creating the context in OptiX 6.

What I think is wrong:

  1. Payload should not crash
  2. Setting max distance to 100m should prevent t_hit being bigger than 100. I get values of couple of thousands.

Tested on Hardware + OS:
GPU: EVGA 1080 Ti 11GB
CPU: i9-7900X
Driver: 418.43
OS: OpenSuse 42.3

GPU: Quadro M5000M
CPU: i7 7990k
Driver: 435.21
OS: Ubuntu 18.04

Ok :D the material system was wrong and was mixing camera and closest_hit programs from different raytracing programs. This explains the payload issue because it had 2 different payloads…

Glad you figured it out. Thanks for updating, it’s helpful to hear how issues get resolved, it helps everyone build the checklist when they hit similar problems.


David.

Thanks David, you are right! I’m always posting here since it really helps to write the problem down on paper and look at it from a distance. It’s also the best documentation for others to fix problems that allredy occurred somewhere else.

@The problem:
Why the illegal address error occurred is because of the size of the payload. The camera program generated rays with a certain payload that had a certain size in memory. Now, if we generate a smaller payload in the camera program than the payload used in the hit_programm and write into all members, CUDA will try to write to a payload member (variable) that is outside of the small payload and will crash with the given error.
If we create a big enough payload in the camera program, the application won’t crash since the memory does exist, but the results will be wrong since the interpretation of the payload memory won’t be correct - writing a float into a bool etc…

This explains both problems I had - the small payload crash, and the weird values inside a bigger payload.