Rendering in a definite range

Hi All!

Instead of rendering the full scene at once, I want to render few portion of the scene separately (e.g., four portions). So, I should have a predefined starting point for each of the 4 parts and the fixed width and height. I tried few ways, but that rendering the whole scene and just showing each of the 4 parts separately in each of the viewports. But I do not want the rendering repeat 4 times, just render each of the part and show output.

I would love to have your suggestion.

Instead of rendering the full scene at once, I want to render few portion of the scene separately (e.g., four portions).

To get the terminology right, if that is using multiple devices, all of them would need the whole scene geometry data to be able to implement, for example, reflections, global illumination and shadows correctly.
Not being able to access the whole scene geometry on each individual device would limit the rendering methodology to local shading algorithms only.

So what you describe is tiled rendering of a full viewport with some number of fixed tiles (here four). For that you would simply change the ray generation program implementing the camera projection to use pixel offsets for the four different sub-viewports and launch with the smaller number of launch indices, here a quarter of the full viewport size. Then you would get four quarter-sized images in this case.

Means rendering resolution representing the full image is different than the launch dimensions. The view-frustum calculation resulting in the primary ray origin and direction is exactly the same for all of the devices, just the mapping of launch index to the effective pixel coordinate on the full viewport is different and needs the respective tile offsets in pixels per device (or some other method like a viewport tile index if the pixel offset can be calculated from that at runtime) to be different inside your OptiX launch parameters per device.

Since you originally said you wanted to do that on different devices and even different machines, combining these to a single full viewport image would then require to transfer the partial images from the client devices to the main device and stitched together according to their tile location to result in the full viewport image if you need that.

My multi-GPU aware OptiX 7 examples are splitting a single image into many small tiles with power-of-two extents to distribute the overall per image workload more regularly in a checkerboard-like pattern among multiple devices and then composite the resulting sub-images to the final full viewport image used for display.
Maybe have a look into this example code doing that:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/shaders/raygeneration.cu#L152
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/shaders/compositor.cu
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/DeviceMultiGPULocalCopy.cpp#L106
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/DeviceMultiGPULocalCopy.cpp#L275
https://github.com/NVIDIA/OptiX_Apps/blob/master/data/system_rtigo3_dual_gpu_local.txt#L49

Your case would be a lot simpler, but would result in less equally distributed workloads, e.g. when one of the sub-viewports contains objects or materials which are much more expensive to render than the other, it would limit the resulting performance to that device. That regularly happens in outdoor scenes with sky and ground.

1 Like

I might have an easier solution, but for that I just need to render only in a finite range, e.g., starting point (40,60) and end point (400,600). The start and end points are arbitrary, that could be any pixel inside the given buffer size. Resizing the launchParams.frame.size.x/y in optixLaunch, I could limit the upper portion. But I could not define the particular starting point, each time it is starting from the (0,0) location. Also tried to run aoptixLaunch inside a loop.

I may be asked a very dumb question. I am thinking something like this figure:

Are you’re saying your sub-viewport is always rendering the lower-left part of the full image?
Then you’re not calculating the pixel coordinate according to your viewport offset correctly which defines the primary ray.

Assuming the upper-right point [400, 600] is exclusive, you would have a launch dimension of (400 - 40, 600 - 60] == [360, 540], that gives you 2D launch indices inside the range [0, 359] x [0, 539].

Now you just need to calculate the effective pixel coordinates to map from launch index [0, 0] to your sub-viewport’s lower left start corner pixel coordinate [40, 60] which is simply an addition.

As said before, that would need to happen before calculating the primary rays inside your ray generation program.

For example look at this pseudo pinhole camera implementation:

extern "C" __global__ void __raygen__pinhole()
{
  PerRayData prd;

  const uint2 theLaunchDim   = make_uint2(optixGetLaunchDimensions()); // This is the size of your sub-viewport [360, 540]
  const uint2 theLaunchIndex = make_uint2(optixGetLaunchIndex());      // These are [0, 359] x [0, 539]

  // Resolution is the actual full rendering resolution (e.g. 1920x1080), not the size of the sub-viewport launch dimension!
  const float2 screen = make_float2(sysData.resolution);
  
  // The launch parameter uint2 viewportOrigin is your [40, 60] offset for the pixel coordinates
  const float2 pixel  = make_float2(theLaunchIndex + sysData.viewportOrigin);
  
  // Just as an example when not using progressive sampling bnut just shooting the primary ray thorugh the center of each pixel
  const float sample = make_float2(0.5f); 
  
  const float2 fragment = pixel + sample;                    // The sub-pixel location.
  const float2 ndc      = (fragment / screen) * 2.0f - 1.0f; // Normalized device coordinates in range [-1, 1].

  // Pinhole camera position and left-handed unnormalized coordinate system UVW spanning the view-frustum.
  const CameraDefinition camera = sysData.cameraDefinition; 

  // Primary ray origin and direction.
  float3 origin    = camera.P;
  float3 direction = normalize(camera.U * ndc.x +
                               camera.V * ndc.y +
                               camera.W);


  // ... Build primary ray and integrate radiance result here.

  float3 radiance = integrator(prd);

  // Write the result into the sub-viewport sized output buffer.
  float4* buffer = reinterpret_cast<float4*>(sysData.outputViewport); // Buffer of the launch dimension size!
  
  const unsigned int index = theLaunchIndex.y * theLaunchDim.x + theLaunchIndex.x; // The linear index into that sub-viewport sized buffer.

  buffer[index] = make_float4(radiance, 1.0f);
}

Note that some of the “make” functions are not defined by CUDA headers. I’m using my own vector types helper header which contains them:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/vector_math.h

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.