"illegal memory access" when trying to set up multiple cameras

Hello,

I am working on setting up an OptiX program that will render a an array of depth values given some camera eye locations and an elevation map.

I have followed along with Ingo Wald’s optix7 course as a starting point. I have posted previously about some issues I was seeing in the depth image, but I think that has been resolved. I wanted to go ahead and try to setup multiple cameras.

Really what I want is to generate a bunch of rays from far above the surface and “sample” the depths at different locations. From my understanding of what I have read so far, I can do 1D, 2D, or 3D launch as long as the launch index are handled properly. I opted to try this in 3D as that is ultimately the shape of the array I will working with for camera positions.

I added this struct inside of my launchParams:

        struct
        {
            vec3f       vertical;
            vec3f       direction;
            vec3f*      cameraLocationsBuffer;
            uint32_t*   depthBuffer;
            vec3i       size;
        } cameras;

I modified my RayGen as:

    extern "C" __global__ void __raygen__renderTrajectoryElevations()
  {
    const uint3        idx        = optixGetLaunchIndex();
    const uint3        dim        = optixGetLaunchDimensions();
    const uint32_t camIndex = idx.z * dim.y * dim.x + idx.y * dim.x + idx.x;

    // depth payload 
    uint32_t u0;

    vec3f camPosition = optixLaunchParams.cameras.cameraLocationsBuffer[camIndex];
    vec3f rayDir = optixLaunchParams.cameras.direction;

    optixTrace(optixLaunchParams.traversable,
                camPosition,
                rayDir,
                0.f, //t_min
                1e3f, //t_max
                0.0f, // rayTime
                OptixVisibilityMask( 255 ),
                OPTIX_RAY_FLAG_DISABLE_ANYHIT, // OPTIX_RAY_FLAG_NONE,
                SURFACE_RAY_TYPE,               // SBT Offset
                RAY_TYPE_COUNT,                 // SBT Stride
                SURFACE_RAY_TYPE,               // miss SBT Index
                u0);
    
    // get depth from payload
    const float depth = __uint_as_float( u0 );

    // and write to frame buffer ...
    optixLaunchParams.cameras.depthBuffer[camIndex] = depth;
  }

And my closest hit as:

  extern "C" __global__ void __closesthit__radiance()
  {
    // get depth
    const float t = optixGetRayTmax();
    optixSetPayload_0( __float_as_uint( t ) );
  }

in my render() function I call launch like:

    if (launchParams.trajectory_cameras.size.x == 0) return;

    launchParamsBuffer.upload(&launchParams,1);

    OPTIX_CHECK(optixLaunch(pipeline,
                            stream,
                            launchParamsBuffer.d_pointer(),
                            launchParamsBuffer.sizeInBytes,
                            &sbt,
                            launchParams.cameras.size.x,
                            launchParams.cameras.size.y,
                            launchParams.cameras.size.z));
    CUDA_SYNC_CHECK();

There is also this resize function which I have modified, I am not sure if that will be important:

  void SampleRenderer::resize_elevationDepthBuffer(const vec3i &newSize) {
    trajDepthBuffer.resize(newSize.x*newSize.y*newSize.z*sizeof(uint32_t));
    launchParams.cameras.size = newSize;
    launchParams.cameras.depthBuffer = (uint32_t*)depthBuffer.d_pointer();
  }

And in my header I have added some new CUDABuffers:

            CUDABuffer cameraLocationsBuffer;
            CUDABuffer depthBuffer;

And finally in the main I am trying to call my render program and get the illegal memory access:

      Model *model = loadOBJ("../models/myscene_smooth.obj");
      SampleRenderer sample(model);

      // should I use BOOST here? vector<vector<double>>(4, vector<double>(5)));

      std::vector<std::vector<std::vector<vec3f>>> camLocations(2,
              std::vector<std::vector<vec3f>> (2,std::vector<vec3f> (2)));
      
      const int K = camLocations.size();
      const int M = camLocations[0].size();
      const int N = camLocations[0][0].size(); 

      // add some random camera locations to test with
      for (int k=0; k<K; k++) {
        for (int m=0; m<M; m++) {
          for (int n=0; n<N; n++) {
            camLocations[k][m][n] = vec3f(-23.0f*float(k+1),4.6f*float(m+1),30.7f*float(n+1));
          }
        }
      }
      
      // get number of cameras
      const int numPoints = K*M*N;       
      // all cameras look down(at surface)
      const vec3f camDirection = (0.0,0.0,-1.0f);
      const vec3f camUp        = (0.0,0.0,1.0f);
      sample.launchParams.cameras.direction = camDirection;
      sample.launchParams.cameras.vertical = camUp;
      // allocate memory
      sample.cameraLocationsBuffer.resize(numPoints*sizeof(vec3f));
      sample.cameraLocationsBuffer.alloc_and_upload(camLocations);
      sample.resize_elevationDepthBuffer(vec3i(2,2,2));
      
      sample.render_trajectory_elevations(); // fails on this call

Any thoughts on where I have gone wrong? I am almost certain it is they way I am allocating the memory, but I guess it could be the way I am using in the RayGen program? The program fails in the call to CUDA_SYNC_CHECK() after the call to OptixLaunch. Are there any glaring issues with my approach? Can you provide any code snippets or pointers on how I can improve? I appreciate any and all feedback as I am really hoping to understand the hows and whys as well as best practices with OptiX/CUDA.

Thank you!
Benjamin

So I have made some progress here. I changed my resize function to:

  void SampleRenderer::resize(const vec3i &newSize) {
    trajDepthBuffer.resize(newSize.x*newSize.y*newSize.z*sizeof(uint32_t));
    cameraLocationsBuffer.resize(newSize.x*newSize.y*newSize.z*sizeof(vec3f));
    launchParams.cameras.size = newSize;
    launchParams.cameras.depthBuffer = (uint32_t*)trajDepthBuffer.d_pointer();
    launchParams.cameras.cameraLocationsBuffer = (vec3f*)cameraLocationsBuffer.d_pointer();
  }

And I got rid of this in the main:

sample.cameraLocationsBuffer.resize(numPoints*sizeof(vec3f));

This allowed the program to launch and render(the points won’t make sense because it is a single pixel from each camera, depth sensor if you will). The rendered values however, did not make sense. They were all zeros. However, I think I found the issue for that as well. I believe it is related to the issue I was having HERE. I will update in that thread.

Yes, all distance (depth) values should have been float types.
Yes, you need to set the device pointers of all your input or output buffers inside the launch parameter structure to be able to access them inside the OptiX device code.

Here are some other code changes which would make things a little simpler. See comments marked with // NV.

struct
{
  vec3f       vertical;
  vec3f       direction;
  vec3f* cameraLocationsBuffer;
  uint32_t* depthBuffer; // NV must be float*
  vec3i       size; // NV Redundant when optixGetLaunchDimensions matches these values.
} cameras;



extern "C" __global__ void __raygen__renderTrajectoryElevations()
{
  const uint3 idx = optixGetLaunchIndex();
  const uint3 dim = optixGetLaunchDimensions();

  const uint32_t camIndex = idx.z * dim.y * dim.x + 
                            idx.y * dim.x + 
                            idx.x; // NV Correct, linear index of 3D launch.

  // depth payload 
  // NV If you initialize this for the miss case with -1.0f, then you wouldn't need a miss program. 
  // (OptixProgramDescription needs to be set to nullptr inside the OptixProgramGroupDesc.)
  uint32_t u0 = __float_as_uint(-1.0f);

  // NV I would recommend to use CUDA vector types (here float3) inside device code, because that is what optixTrace arguments expect.
  vec3f camPosition = optixLaunchParams.cameras.cameraLocationsBuffer[camIndex];
  // NV I'm assuming this is only debug code and should later render with some projection per camera?
  vec3f rayDir      = optixLaunchParams.cameras.direction;

  optixTrace(optixLaunchParams.traversable,
             camPosition, // NV This expects a float3
             rayDir,
             0.0f, // t_min
             1e3f, // t_max // NV Make this as small as possible to capture all your results.
             0.0f, // rayTime
             OptixVisibilityMask(255),
             OPTIX_RAY_FLAG_DISABLE_ANYHIT, // OPTIX_RAY_FLAG_NONE,
             SURFACE_RAY_TYPE,               // SBT Offset
             RAY_TYPE_COUNT,                 // SBT Stride
             SURFACE_RAY_TYPE,               // miss SBT Index // NV There wouldn't need to be a miss program when all that does is returning -1.0f inside the payload register.
             u0);

  // get depth from payload
  const float depth = __uint_as_float(u0); // NV, Yes, positive when hit. A better name would be "distance".

  // and write to frame buffer ...
  optixLaunchParams.cameras.depthBuffer[camIndex] = depth;
}


extern "C" __global__ void __closesthit__radiance()
{
  // get depth 
  const float t = optixGetRayTmax(); // NV Correct. It's the intersection "distance".
  optixSetPayload_0(__float_as_uint(t));
}


if (launchParams.trajectory_cameras.size.x == 0)
  return;

launchParamsBuffer.upload(&launchParams, 1);

OPTIX_CHECK(optixLaunch(pipeline,
                        stream,
                        launchParamsBuffer.d_pointer(),
                        launchParamsBuffer.sizeInBytes,
                        &sbt,
                        launchParams.cameras.size.x,   // NV CAMERA_XRES
                        launchParams.cameras.size.y,   // NV CAMERA_YRES
                        launchParams.cameras.size.z)); // NV NUM_CAMERAS
CUDA_SYNC_CHECK();




Model* model = loadOBJ("../models/myscene_smooth.obj");
SampleRenderer sample(model);

// should I use BOOST here? vector<vector<double>>(4, vector<double>(5)));
// NV This can be done much simpler with a linear array.

// NV I'm assuming the final goal is to rendere a number of camera intersection distance images with some projection in a single launch.
// NV If all images have the same dimension, using a 3D launch is actually working well,
// because the internal warps will use a 2D layout on the individual 2D slices of the 3D volume, so memory locality is good.

//std::vector<std::vector<std::vector<vec3f>>> camLocations(2, std::vector<std::vector<vec3f>>(2, std::vector<vec3f>(2)));
//
//const int K = camLocations.size();
//const int M = camLocations[0].size();
//const int N = camLocations[0][0].size();
//
//// add some random camera locations to test with
//for (int k = 0; k < K; k++) {
//  for (int m = 0; m < M; m++) {
//    for (int n = 0; n < N; n++) {
//      camLocations[k][m][n] = vec3f(-23.0f * float(k + 1), 4.6f * float(m + 1), 30.7f * float(n + 1));
//    }
//  }
//}

// NV Assuming there are some defines or function parameters defining the camera x- and y-resolution and the number of cameras.
// NV Using your K, M, N values, define as size_t to match following size calculations. Could also use ull suffix
#define CAMERA_XRES size_t(2)
#define CAMERA_YRES size_t(2)
#define NUM_CAMERAS size_t(2)

  const size_t numPoints = CAMERA_XRES * CAMERA_YRES * NUM_CAMERAS;

  std::vector<vec3f> camLocations(numPoints);
  vec3f* p = camLocations.data(); // NV Local running pointer.

  // NV Could also use explicit dynamic allocations, but that wouldn't work with those API wrappers alloc_and_upload().
  // vec3f* camLocations = new vec3f[CAMERA_XRES * CAMERA_YRES * NUM_CAMERAS];
  // vec3f* p = camLocations;

  // add some random camera locations to test with
  // NV This exactly matches the linear index calculation inside the OptiX device code.
  for (size_t z = 0; z < NUM_CAMERAS; ++z)
  {
    for (size_t y = 0; y < CAMERA_YRES; ++y)
    {
      for (size_t x = 0; x < CAMERA_XRES; ++x)
      {
        *p++ = vec3f(-23.0f * float(x + 1), // NV Not the same order because you used k for x, etc.
                       4.6f * float(y + 1), 
                      30.7f * float(z + 1));
      }
    }
  }

// get number of cameras 
// NV I'm assuming this is only debug code because this is the number of rays and later each camera should shoot CAMERA_XRES * CAMERA_YRES rays?
// Moved up to use for the host size allocation as well.
// const int numPoints = K * M * N;

// all cameras look down(at surface)
// NV This is also hopefully debug code as well, because if all rays go straight down onto an elevation map aligned with the xy-plane, 
// you wouldn't need a raytracer to determine these results.
// NV Also note that having a direction collinear with the camera up-axis will not work if that should define a projection later.
const vec3f camDirection = (0.0f, 0.0f, -1.0f); // NV Careful with immediate double values. Always append "f" for floats.
const vec3f camUp = (0.0f, 0.0f, 1.0f);         // NV Try is this can use the constructor directly in these: const vec3f camUp(0.0f, 0.0f, 1.0f);

sample.launchParams.cameras.direction = camDirection;
sample.launchParams.cameras.vertical  = camUp;

// allocate memory
//sample.cameraLocationsBuffer.resize(numPoints * sizeof(vec3f));
sample.cameraLocationsBuffer.alloc_and_upload(camLocations);

// delete [] camLocations; // NV when using new[] above.

sample.resize_elevationDepthBuffer(vec3i(CAMERA_XRES, CAMERA_YRES, NUM_CAMERAS));

sample.render_trajectory_elevations(); // fails on this call    

I’m assuming the final goal will be to renderer a number of 2D distance images from different camera positions and actual camera projections. So you wouldn’t actually define all ray origins and directions but instead a camera and a 2D resolution?
Because when all ray directions are straight down as set in const vec3f camDirection = (0.0f, 0.0f, -1.0f); then you do not need a ray tracer at all to determine the distance on an elevation map with no caves. That could be determined with a texture map lookup instead.

Also please note that the optixLaunch dimension is limited to 2^30.

Please read these related threads:
https://forums.developer.nvidia.com/t/rendering-for-multiple-predefined-camera-locations/295480/2
https://forums.developer.nvidia.com/t/3d-optixlaunch-to-accommodate-multiple-viewpoints/160421/2

@droettger Thank you for your suggestions. I have implemented a few of them. I do have some questions about some of the things you mentioned, as well as some odd behavior I have found for my single ray per/camera program.

Because when all ray directions are straight down as set in const vec3f camDirection = (0.0f, 0.0f, -1.0f); then you do not need a ray tracer at all to determine the distance on an elevation map with no caves. That could be determined with a texture map lookup instead.

Can you describe what you mean? I am not familiar with “texture map look ups”, but when I googled it looks like a way to map colors to pixles(or I suppose elevations). It kinds seems like a look up table, but what I am trying to do is to interpolate between known values. So the vertex of my mesh are known values the triangles that form the face are essentially linear interpolated values. So when I trace a ray against the surface I can get the interpolated elevation at that point. I was doing this previously in a mesh grid using interp2D. I was just trying to replicate that behavior. Perhaps the ray tracing is overkill, but it was partially an exercise in seeing if it would work, and partially because I need to have the other parts of the pipeline set up (the AS, SBT) for rendering the depth and color images from various points in the elevation map. It does seem that the raytracing implementation is slightly faster than interp2D, but it is not a fair comparison as the interp2D was running in scipy and not on the GPU.

I’m assuming this is only debug code because this is the number of rays and later each camera should shoot CAMERA_XRES * CAMERA_YRES rays?

Yes, sort of. There are two things I was attempting to do. The depth/distance to surface was just the first part. The second thing I want to do is just as you mention, render images from poses around the elevation map. In this first attempt I was trying to a) see if I could use Optix to get the same results I was getting with interp2D, b) get a better understanding of allocating memory/buffers/etc… on the GPU. Eventually I want these camera poses to come from another application running on the GPU so I would hopefully pass them directly from that program and avoid the upload/download overhead. This would include both the “interpolated distances” as well as the rendered images from various poses, I would do as much processing on the GPU as possible before downloading the final result.

// NV Also note that having a direction collinear with the camera up-axis will not work if that should define a projection later.

Can you provide a little more information about this? You mean to say that we can’t render an image looking straight down? It seems that I am experiencing just that behavior. I can render the depth image from any point, but when I place it looking straight down it renders no image. Why is this the case? Are there any workarounds(other than tilting just slightly)?

I am still having an issue my distance checking rays. So the pipeline is set up nearly the same as before with:

rayGen:

    extern "C" __global__ void __raygen__renderElevations()
  {
    const uint3        idx        = optixGetLaunchIndex();
    const uint3        dim        = optixGetLaunchDimensions();
    const uint32_t camIndex = idx.z * dim.y * dim.x + idx.y * dim.x + idx.x;

    // the value we store the distance in:
    uint32_t u0;

    float3 camPosition = optixLaunchParams.cameras.cameraLocationsBuffer[camIndex];
    float3 rayDir = normalize(optixLaunchParams.cameras.direction);

    optixTrace(optixLaunchParams.traversable,
                camPosition,
                rayDir,
                0.f, //t_min
                1e8f, //t_max
                0.0f, // rayTime
                OptixVisibilityMask( 255 ),
                OPTIX_RAY_FLAG_DISABLE_ANYHIT, // OPTIX_RAY_FLAG_NONE,
                SURFACE_RAY_TYPE,               // SBT Offset
                RAY_TYPE_COUNT,                 // SBT Stride
                SURFACE_RAY_TYPE,               // miss SBT Index
                u0);
    
    // get depth from payload
    const float distance = __uint_as_float( u0 );

    // and write to frame buffer ...
    optixLaunchParams.cameras.distanceBuffer[camIndex] = distance;
  }

closest hit:

  extern "C" __global__ void __closesthit__radiance()
  {
    // get depth
    const float t = optixGetRayTmax();
    optixSetPayload_0( __float_as_uint( t ) );
  }

miss: - I will look into removing the miss altogether as you mentioned, but I am not entirely sure yet what all needs to change so I wanted to debug the other parts first.

  extern "C" __global__ void __miss__radiance()
  { 
    // if we miss set to negative value
    optixSetPayload_0( __float_as_uint( -1.0f ) );
  }

Launch Params:

    struct LaunchParams
    {
        struct
        {
            vec3f       direction;
            vec3f*     cameraLocationsBuffer;
            float*      distanceBuffer;
            vec3i       size;
        } cameras;

        OptixTraversableHandle traversable;
    };

and the main:

      Model *model = loadOBJ("../models/myscene_smooth.obj");

      SampleRenderer sample(model);
      
      #define CAMERA_XRES size_t(2)
      #define CAMERA_YRES size_t(2)
      #define NUM_CAMERAS size_t(2)

      const size_t numPoints = CAMERA_XRES * CAMERA_YRES * NUM_CAMERAS;

      std::vector<vec3f> camLocations(numPoints);
      vec3f* p = camLocations.data();

      for (size_t z = 0; z < NUM_CAMERAS; ++z)
      {
        for (size_t y = 0; y < CAMERA_YRES; ++y)
        {
          for (size_t x = 0; x < CAMERA_XRES; ++x)
          {
            *p++ = vec3f(23.0f, // NV Not the same order because you used k for x, etc.
                          4.6f, 
                          30.7f);
          }
        }
      }
      // still trying to debug why I am getting misses, eventually they should all look down
      // or have well defined poses 
      const vec3f camAt = model->bounds.center();

      // const vec3f camDirection = (0.1f,0.1f,1.0f);
      vec3f camDirection = normalize(camAt-camLocations[0]);

      sample.launchParams.cameras.direction = normalize(camDirection);

      sample.resize(vec3i(CAMERA_XRES, CAMERA_YRES, NUM_CAMERAS));
      sample.cameraLocationsBuffer.alloc_and_upload(camLocations);
     
      sample.render();
      std::vector<float> depths;
      depths.resize(numPoints);
      sample.downloadPixels(depths.data());
      
      for (int i=0; i<depths.size(); i++){
        std::cout << depths[i] << std::endl;
      }


    } catch (std::runtime_error& e) {
      std::cout << GDT_TERMINAL_RED << "FATAL ERROR: " << e.what()
                << GDT_TERMINAL_DEFAULT << std::endl;
          exit(1);
    }
    return 0;

Running this returns -1.0 (the miss value) for all rays. However if I render the full image from this same pose I get the correct depths. Why would shooting just a single ray give me the incorrect result? Am I doing something incorrect? I am really at a loss here as to what this issue might be.

Thank you!
Benjamin

This will take longer.

Let’s first analyze what you programmed now.
I’m only copying the questionable code lines plus new comments.

1.) 
      // NV What is going on here?!
      // NV Why are you setting all ray origins to the same hardcoded point?
      // NV Where inside the scene geometry is that?
      // NV Note that I do not call these "camera positions" because then you wouldn't need a 3D array of them.
      *p++ = vec3f(23.0f,
                   4.6f,
                   30.7f);

2.) 
// NV Where inside the model is that relative to the hardcoded ray origins above?
const vec3f camAt = model->bounds.center();

3.) 
// NV And what is going one here?
// NV Now you set the camera direction to the normalized vector from the hardcoded camera ray origin to the model bounds center for all rays.
// NV Since you set all ray origins to the same point anyway, you only reference the first to set a ray direction for all cameras. Then you wouldn't need the 3D ray origins and could put this into the launch parameters.
vec3f camDirection = normalize(camAt - camLocations[0]); 

4.)
// NV Now where does the ray origin lie relative to the terrain? 
// NV Where does the direction go relative to the terrain?
// NV And for good measure you normalize the already normalized vector again.
sample.launchParams.cameras.direction = normalize(camDirection); 

5.)
// NV I'm assuming this resizes the float* distanceBuffer to CAMERA_XRES * CAMERA_YRES * NUM_CAMERAS elements receiving the distance results?
sample.resize(vec3i(CAMERA_XRES, CAMERA_YRES, NUM_CAMERAS)); 

Possible reasons for why this hardcoded ray doesn’t hit the terrain mesh:
a) The center of the model bounds is above the ray origin and if that is above the terrain, all your identical(!) rays are going upwards.
b) Similarly when the ray origin and model bounds at similar heights and the ray is going horizontally over the terrain.
c) Ray origin is below the terrain and the direction goes downward.

I was doing this previously in a mesh grid using interp2D.

If your elevation data is a regular grid of height values, then the height on each point on that grid can be determined with a linearly interpolated texture lookup in hardware.
This only works for the height-over-ground result you currently get when shooting rays straight down.
This does not work when the elevation data is not a regular grid but randomly sized triangles. (That would require to rasterize the terrain as triangles.)
This does not work when you want to render a camera projection from some single camera position, because then most rays wouldn’t shoot straight down to the terrain but at an angle and you need the closest hit which the ray tracer provides easily.

There are two things I was attempting to do. The depth/distance to surface was just the first part.
The second thing I want to do is just as you mention, render images from poses around the elevation map.

In this first attempt I was trying to
a) see if I could use Optix to get the same results I was getting with interp2D,

There is no problem in implementing that with OptiX.
I could implement what you need in about an hour in my own example framework.

b) get a better understanding of allocating memory/buffers/etc… on the GPU.

Note that the example framework you’re looking at is actually trying to abstract all of the details from the user.
That is not what I would recommend when trying to learn the fundamentals of the OptiX and CUDA APIs required to implement arbitrary OptiX applications.

Eventually I want these camera poses to come from another application running on the GPU so I would hopefully pass them directly from that program and avoid the upload/download overhead.
This would include both the “interpolated distances” as well as the rendered images from various poses, I would do as much processing on the GPU as possible before downloading the final result.

Woah, that opens a completely different can of worms!
Read this thread describing some limitations and issues:
https://forums.developer.nvidia.com/t/long-shot-access-mesh-data-in-different-program-but-already-loaded-in-the-gpu/157117/2
https://forums.developer.nvidia.com/t/optixaccelrelocationinfo-data/167114/6
I would recommend trying other communication methods between processes on the host first.

Can you provide a little more information about this? You mean to say that we can’t render an image looking straight down?

No, of course you can a render images with arbitrary camera definitions.
I’m assuming you have read all links I provided before, explaining how a pinhole camera is setup with a position P and U,V,W vectors spanning a left-handed view frustum.

When defining the pinhole camera directly with the P,U,V,W data, any positioning and orientation is possible.

But there are different ways to define such camera with a position, lookat point, up-vector, and a field of view angle.
When using that second method, the up-vector is used to make the camera V vector upright. (This prevents rolling the camera around its forward axis during orbit operations, for example.)

For that to work the direction vector = (lookat - position) and up-vector must not be collinear because that is not spanning up a plane and then there exist infinitely many V vectors which would be perpendicular to the forward direction. That problem, resp. the reduction of the degree of freedom with the up-vector, is often described as gimbal lock and results in erratic camera behavior.

So in your case, when your ray directions are all (0, 0, -1) then the up-vector must not be (0, 0, 1). Just pick a different vector which is not collinear to the ray direction, like ((0, 1, 0), or (1, 0, 0) or any other non-collinear vector. Depends on how you want the camera plane to be oriented above your terrain.

Again, if you specify the P, U, V, W data of a standard pinhole camera directly, there should be no issue selecting the right vectors for a downward direction with W along (0, 0, -1) and the U and V vectors orthogonal to that.

Let’s assume the goal is to get a number of 2D single channel floating point data containing the distance to a camera position above some terrain.
Like when flying with a plane over some landscape and doing aerial photos.

Define a camera struct which fully describes the position and a projection, as simple as this.

struct PinholeCamera
{
  float3 P;
  float3 U;
  float3 V;
  float3 W;
};

Define a 1D vector of these camera structs and initialize them with the proper location and projection vectors.
Allocate and upload that to the GPU device and store the pointer to that device data and the number of entries inside the launch parameters.
(If all projection vectors U, V, W are the same for all cameras, there is no need to store them inside the camera struct but they could be put into the launch parameters and only the camera positions would need to be allocated and uploaded as 1D array.)

Allocate the target distance result buffer. If all 2D distance images should have the same resolution you can allocate a 3D array with XRES * YRES * number_of_cameras size, one float per element.
Addressing of the element as linear offset is the same as in your code already.

const uint3    idx = optixGetLaunchIndex();
const uint3    dim = optixGetLaunchDimensions();
const uint32_t distanceIndex = idx.z * dim.y * dim.x +
                               idx.y * dim.x +
                               idx.x;

Note that idx.z (which is the respective xy-2D image slice inside the 3D output buffer) is also the index into the camera array.
So inside the raygen program use idx.z to get the PinholeCamera from the arrays and calculate the resulting projection (ray direction) with the idx.x and idx.y values as shown in many OptiX examples for each pixel inside the 2D xy-slice of the output distances

Setup everything else for the pipeline and shader binding table as before.
Call optixLaunch with the dimension width = XRES, height = YRES, depth = number_of_cameras.
Download the distance data in xy-slices from the device and store them into images on disk or what you want to do with them.

A word of warning: As mentioned in the posted links before, the OptiX launch dimension is limited to 2^30.
So if you want, for example, render 1024x1024 sized images per camera, you can only render 1024 cameras at once (1024x1024x1024 == 2^30). That is 1 Giga rays!
When you want to render bigger images, you can only render fewer cameras.
If your image resolutions are too small to saturate a modern GPU (everything below 256x256 for example) then grouping multiple cameras into one launch like that is actually good.
It’s also totally reasonable to render only one camera first. When num_cameras == 1 the optixLaunch is actually a 2D launch automatically with no changes.

If you have additional coding question, please attach the full source code as file instead of posting only code excerpts from now on. It’s much simpler and faster to identify problems then.

@droettger I do fear I may be giving you an aneurysm with my naive questions, but I really appreciate all of the great feed back.

The reason for the hard coding of values was simply to see if I could get something working. I plan to set these values programmatically once I have a working skeleton.

3.) 
// NV And what is going one here?
// NV Now you set the camera direction to the normalized vector from the hardcoded camera ray origin to the model bounds center for all rays.
// NV Since you set all ray origins to the same point anyway, you only reference the first to set a ray direction for all cameras. Then you wouldn't need the 3D ray origins and could put this into the launch parameters.
vec3f camDirection = normalize(camAt - camLocations[0]); 

4.)
// NV Now where does the ray origin lie relative to the terrain? 
// NV Where does the direction go relative to the terrain?
// NV And for good measure you normalize the already normalized vector again.
sample.launchParams.cameras.direction = normalize(camDirection); 

You’re comments seriously had me laughing, I think the double normalizing was just an over-site on my part. I read that normalizing was recommended, and when it didn’t work I was trying anything to get a result.

So in your case, when your ray directions are all (0, 0, -1) then the up-vector must not be (0, 0, 1). Just pick a different vector which is not collinear to the ray direction, like ((0, 1, 0), or (1, 0, 0) or any other non-collinear vector. Depends on how you want the camera plane to be oriented above your terrain.

This was really helpful, I stupidly was thinking up in the global frame. I realize now that was not at all what it meant. I did some more reading on specifying a pinhole camera using P,U,V,W data. I think I have pretty good grasp on this now, and I am confident I have defined the cameras correctly using both PUVW (looking at optixTriangle) and in the more abstracted sense as in Ingo Wald’s examples. I feel this, because have copied the code/syntax exactly. Only replacing the single cameras with vectors of cameras. I have now even tried as you suggested above and creating a new struct for camera and then created a vector of those structs.

My problem I believe is actually coming from something else. I believe there is an issue with memory allocation. I found a little script that would allow me to print values out from the GPU.

    # if __CUDA_ARCH__>=200
    printf("%f \n", origin.x);
    #endif

Doing this inside the ray_gen program for any value that I used “alloc_and_upload” like in the following (my whole main)

#include "SampleRenderer.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "3rdParty/stb_image_write.h"
#include <chrono>

/*! \namespace osc - Optix Siggraph Course */
namespace osc {

  extern "C" int main(int ac, char **av)
  {
    try {

      Model *model = loadOBJ("../models/myscene_smooth.obj");

      SampleRenderer sample(model);
      
      #define CAMERA_XRES size_t(1200)
      #define CAMERA_YRES size_t(800)
      #define NUM_CAMERAS size_t(1)

      const size_t numPoints = CAMERA_XRES * CAMERA_YRES * NUM_CAMERAS;

      std::vector<vec3f> camFrom(NUM_CAMERAS);
      std::vector<vec3f> camAt(NUM_CAMERAS);
      vec3f* pe = camFrom.data();
      vec3f* pd = camAt.data();

      for (size_t z = 0; z < NUM_CAMERAS; ++z)
      {
        *pe++ = vec3f(-5.95f, // 
                      -5.95f, 
                      4.7f);

        *pd++ = model->bounds.center();             
      }

      sample.resize(vec3i(CAMERA_XRES, CAMERA_YRES, NUM_CAMERAS));
      sample.cameraFromBuffer.alloc_and_upload(camFrom);
      sample.cameraAtBuffer.alloc_and_upload(camAt);
     
      sample.render();
      std::vector<float> depths;
      depths.resize(numPoints);
      sample.downloadPixels(depths.data());  

    } catch (std::runtime_error& e) {
      std::cout << GDT_TERMINAL_RED << "FATAL ERROR: " << e.what()
                << GDT_TERMINAL_DEFAULT << std::endl;
          exit(1);
    }
    return 0;
  }

And cudeBuffer header file:

#pragma once

#include "optix7.h"
// common std stuff
#include <vector>
#include <assert.h>

/*! \namespace osc - Optix Siggraph Course */
namespace osc {

    /*! simple wrapper for creating, and managing a device-side CUDA
        buffer */
    struct CUDABuffer {
        inline CUdeviceptr d_pointer() const
            { return (CUdeviceptr)d_ptr; }

            //! re-size buffer to given number of bytes
            void resize(size_t size)
            {
                if (d_ptr) free();
                alloc(size);
            }

            //! allocate to given number of bytes
            void alloc(size_t size)
            {
                assert(d_ptr == nullptr);
                this->sizeInBytes = size;
                CUDA_CHECK(Malloc( (void**)&d_ptr, sizeInBytes));
            }

            //! free allocated memory
            void free()
            {
                CUDA_CHECK(Free(d_ptr));
                d_ptr = nullptr;
                sizeInBytes = 0;
            }

            template<typename T>
            void alloc_and_upload(const std::vector<T> &vt)
            {
                alloc(vt.size()*sizeof(T));
                upload((const T*)vt.data(),vt.size());
            }

            template<typename T>
            void upload(const T *t, size_t count)
            {
                assert(d_ptr != nullptr);
                assert(sizeInBytes == count*sizeof(T));
                CUDA_CHECK(Memcpy(d_ptr, (void *)t,
                                    count*sizeof(T), cudaMemcpyHostToDevice));
            }

            template<typename T>
            void download(T *t, size_t count)
            {
                assert(d_ptr != nullptr);
                assert(sizeInBytes == count*sizeof(T));
                CUDA_CHECK(Memcpy((void *)t, d_ptr,
                                    count*sizeof(T), cudaMemcpyDeviceToHost));
            }

            size_t sizeInBytes { 0 };
            void  *d_ptr { nullptr }; 
    };

} // ::osc

And finally using these in compute ray:

  static __forceinline__ __device__ void computeRay( uint3 idx, uint3 dim, vec3f& origin, vec3f& direction)
  {
    origin = optixLaunchParams.cameras.positionBuffer[idx.z];
   //=============================================================
   // THIS ALWAYS PRINTS 0.0000 NO MATTER WHICH VALUE I PRINT
   // ============================================================
     # if __CUDA_ARCH__>=200
     printf("%f \n", origin.x);
     #endif

    vec3f camAt = optixLaunchParams.cameras.atBuffer[idx.z];
    vec3f camDirection = normalize(camAt-origin);
    const float cosFovy = 0.66f;
    const float aspect = optixLaunchParams.cameras.frame.size.x/ float(optixLaunchParams.cameras.frame.size.y);
    const vec3f camUp(0.f,0.f,1.f);
    vec3f horizontal = cosFovy * aspect * normalize(cross(camDirection, camUp));
    vec3f vertical = cosFovy * normalize(cross(horizontal,camDirection));

    //  normalized screen plane position, in [0,1]^2
    const vec2f screen(vec2f(idx.x+.5f,idx.y+.5f)
                              / vec2f(optixLaunchParams.cameras.frame.size));
                                       
    vec3f rayDir = normalize(camDirection
                              + (screen.x - 0.5f) * horizontal
                              + (screen.y - 0.5f) * vertical);
  }

This is computeRay is adapted from Ingo Wald’s work, but I have done that correctly. The more worrying thing is that all the data I have allocated and uploaded is 0.00000 inside of the kernel. I believe this is why all my rays were missing, they were starting at 0, going to 0. I’m not sure if this means they are all essentially null or just going completely random directions, but I would guess the first because not one of them ever even randomly hit.

I know these aren’t official OptiX examples, but any idea why this allocating and uploading would not work? I am really beginning to wonder if I need to go back to the beginning and build a new pipeline from the SDK examples(or jump into your’s, but they seem a bit more advanced to me). Any suggestions on how to correctly allocate and upload the data?

Please let me know you would still like more code posted. I did not want to paste the entire project as it is quite a lot.

Thank you!
Benjamin

The more worrying thing is that all the data I have allocated and uploaded is 0.00000 inside of the kernel.

You’re sourcing data from device pointers inside your launch parameters.

origin = optixLaunchParams.cameras.positionBuffer[idx.z];
vec3f camAt = optixLaunchParams.cameras.atBuffer[idx.z];

But you don’t show any code which actually sets these two device pointers inside the launch parameters.
This is only allocating device buffers and copying data from host to device but the OptiX device program wouldn’t know where that data is.

  sample.cameraFromBuffer.alloc_and_upload(camFrom);
  sample.cameraAtBuffer.alloc_and_upload(camAt);

Afterwards you must do something like this to actually set the device pointers inside the launch parameters.
That is the second time you missed that. (That’s just dry coding, no idea if that compiles.)

    launchParams.cameras.positionBuffer =  reinterpret_cast<vec3f*>(sample.cameraFromBuffer.d_ptr);
    launchParams.cameras.atBuffer       =  reinterpret_cast<vec3f*>(sample.cameraAtBuffer.d_ptr);

I can also only assume you changed everything inside your SampleRenderer implementation to work with the 3D dimension resizing the actual output float* distanceBuffer;.

sample.resize(vec3i(CAMERA_XRES, CAMERA_YRES, NUM_CAMERAS));

Maybe that also contains the necessary launch parameter assignments, but I cannot know that when you do not provide the code for that.
If you didn’t set these, there should have been illegal access errors or misaligned access errors though.

Again, the camera definition with position, lookat, up and fov is brittle.

    vec3f camAt = optixLaunchParams.cameras.atBuffer[idx.z];
    // FIXME If camAt == origin that subtraction is a null vector and normalize() on a null vector will produce NaN values.
    vec3f camDirection = normalize(camAt - origin);
    const float cosFovy = 0.66f;
    const float aspect = optixLaunchParams.cameras.frame.size.x / float(optixLaunchParams.cameras.frame.size.y);
     // FIXME The cross(camDirection, camUp) will result in a null vector when camDirection and camUp are collinear.
    // Then the normalize() on a null vector will produce NaN values.
    // NaN in any ray components will result in invalid ray exceptions inside OptiX.
    // That means you will not get any result from that ray and your output is wrong.
    const vec3f camUp(0.f, 0.f, 1.f);
    vec3f horizontal = cosFovy * aspect * normalize(cross(camDirection, camUp));
    vec3f vertical = cosFovy * normalize(cross(horizontal, camDirection));

Depending on how many cameras you have it might make sense to calculate the camera projection frustum U, V, W vectors on the host with all necessary input value checks and then use that on the device.

Example code for a perspective and orthographic camera projection from the same P, U, V, W inputs: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/GLTF_renderer/cuda/raygen.cu#L257
Calculation of the P, U, V, W: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/GLTF_renderer/Camera.cpp#L133
Note that comment on the up-vector inside that!
(If github is not jumping to the correct code line (CTRL+clicking the links seems broken again) then scroll down to the marked code line.)

Please let me know you would still like more code posted. I did not want to paste the entire project as it is quite a lot.

There is an Upload button inside the editor toolbar which allows to attach text, image, or zip files which are easier to handle than walls of code blocks.

1 Like

@droettger My apologies for the delayed response.

I indeed was pointing my launch params buffer to the CUDA buffer. However, I was doing it in the wrong order. I was first setting the pointers equal and then doing the allocation and memory copy. I guess somehow in this process the pointer on the device side moved or got reallocated somewhere else? Either way, I switched the order and everything worked as expected. Thank you for bearing with me and providing such detailed replies.

Cheers,
Benjamin