The anti-aliasing Algorithm in Optix

maxli · October 21, 2013, 3:34am

I found it’s really hard for me to understand the antialiasing algorithm in the SDK examples, when I render an image with CPU, several rays will be used to determine the color of a pixel, this is the antialiasing algorithm what I have known well, but in Optix, it doesn’t seem to be like that. So is there any one who would give me a guidance, thank you very much.

marknv · October 21, 2013, 2:29pm

(notice: this message was split into two parts due to length constraints)

Hi maxli, the zoneplate sample should have what you need to know regarding implementing anti-aliasing in your code (it doesn’t use a normal ray tracing pipeline but a simpler one to demonstrate this particular technique).

As a quick walkthrough for the first (and simplest) anti-aliasing technique please take a look at the following explanation.

The zoneplate sample begins by setting up the context and, by default, using a simple ray generation program which is the following

RT_PROGRAM void zp_color_only()
{
  float result = computeResult( make_float2(launch_index.x + 0.5f, launch_index.y + 0.5f) );
  output_color_only[launch_index] = make_color( make_float3(result) );
}

This in turn simply fills the output buffer (which ultimately gets rendered on the scene) with either 0 or 1 values (black or white) depending on the coordinates a thread has assigned (if they’re both even or odd, then it’s white, otherwise it’s black). Notice that a rotation is first applied to the pixel coordinates (by default, 30 degrees)

static __device__ float checkerboard(float2 loc)
{
  int2 rotloc;
  float angle = checkerboard_rotate * M_PIf / 180.f;
  float ca = cosf(angle), sa = sinf(angle);
  rotloc.x = abs((int)floor(((ca*loc.x + sa * loc.y) / checkerboard_width)));
  rotloc.y = abs((int)floor(((ca*loc.y - sa * loc.x) / checkerboard_width)));
  
  if (rotloc.x % 2 == rotloc.y % 2) return 1.0f;
  return 0.0f;  
}

If you press ‘m’ during runtime execution, the ray generation program gets switched to the following

void recompute_zoneplate() {
  switch( aa_type ) {
    ...
    case AA_SAMPLE_GATHER:
      m_context->launch(ENTRY_GENERATE_SAMPLES, sqrt_samples_per_pixel * window_width, sqrt_samples_per_pixel * window_height );
      m_context->launch(ENTRY_GATHER_SAMPLES, window_width, window_height );

    ...

Two entries are here assigned to OptiX to render the anti-aliasing technique: ENTRY_GENERATE_SAMPLES and ENTRY_GATHER_SAMPLES; they both use different ray generation programs

m_context->setRayGenerationProgram( ENTRY_GENERATE_SAMPLES, m_context->createProgramFromPTXFile( ptx_path, "zp_generate_samples" ) ); 
  m_context->setRayGenerationProgram( ENTRY_GATHER_SAMPLES, m_context->createProgramFromPTXFile( ptx_path, "zp_gather_samples" ) );

As the names suggest, the first is used to generate the anti-aliasing samples while the latter gathers them and ultimately decides which averaged color should be picked for a given pixel.

Before invoking the ray generation programs, the output buffer is resized (more samples = more space needed to store them)

void resizeOptiXBuffers()
{
  m_context["output_color_only"]->getBuffer()->setSize( window_width, window_height );
  switch( aa_type ) {
    case AA_NONE:
      break;
    case AA_SAMPLE_GATHER:
      m_context["output_samples"]->getBuffer()->setSize( sqrt_samples_per_pixel * window_width, sqrt_samples_per_pixel * window_height );
      break;
      ...

If previously you had something like a 504x504 pixel-wide window, now if sqrt_samples_per_pixel = 3, you’re going to have 9 times that much because each pixel gets “enclosed” in a 3x3 box (with the pixel sitting right in the center). All the other values are the “samples” you will use for anti-aliasing. Also the launch grid is expanded (now lots of new threads are launched to fill all the output buffer).

The generate_samples() ray generation program is first invoked (in the first anti-aliasing basic technique)

RT_PROGRAM void zp_generate_samples()
{
  float2 loc = get_new_sample(launch_index);
  output_samples[launch_index].x = loc.x;
  output_samples[launch_index].y = loc.y;
  output_samples[launch_index].value = computeResult(loc);
}

static __device__ float2 get_new_sample( uint2 corner )
{
  float2 loc = make_float2( (corner.x + 0.5f) / sqrt_samples_per_pixel,
                            (corner.y + 0.5f) / sqrt_samples_per_pixel ); 

  return loc;
}

marknv · October 21, 2013, 2:30pm

This code fills the output buffer (get_new_sample()) with a grid of floating point values equally spaced (1/3 each) and where each 3x3 square may represent 9 sample values for a pixel, or perhaps 25 values may represent the samples for it

These coordinates are stored and the value per each one of these samples is actually calculated (i.e. black or white in the checkerboard image) with the computeResult() which in turn calls the checkerboard() routine as seen above. This is the ‘sample-picking’ stage where you over-sampled the image you need to render (in a fashion similar to how you would oversample a signal).

Now for the last part of the first anti-aliasing technique: the entry point is the ENTRY_GATHER_SAMPLES and the associated ray-generation program is the zp_gather_samples()

RT_PROGRAM void zp_gather_samples()
{
  // figure out the x,y extent of all samples that might affect this output pixel;
  uint2 ll, ur;
  
  ll.x = max((int)floorf(sqrt_samples_per_pixel * (launch_index.x + .5f - filter_width)),0);
  ur.x = min((int)ceilf(sqrt_samples_per_pixel * (launch_index.x + .5f + filter_width)), sqrt_samples_per_pixel * window_size.x-1);
  ll.y = max((int)floorf(sqrt_samples_per_pixel * (launch_index.y + .5f - filter_width)),0);
  
  float num = 0.f;
  float denom = 0.f;
  
  while (ur.x-- != ll.x) {
    ur.y = min((int)ceilf(sqrt_samples_per_pixel * (launch_index.y + .5f + filter_width)), sqrt_samples_per_pixel * window_size.y-1);

    while (ur.y-- != ll.y) {
      uint2 sample_index = make_uint2(ur.x,ur.y);
      float filt = evaluate_filter( make_float2(output_samples[sample_index].x,output_samples[sample_index].y), launch_index );
      num += filt * output_samples[sample_index].value;
      denom += filt;
    }
  }

  output_color_only[launch_index] = make_color(make_float3(num/denom));
}

This code looks tricky since it was adapted to work with different anti-aliasing configurations, but in the first anti-aliasing technique (sample and gather with a box filter) it performs rather simple operations. Here another parameter comes into play: filter_width. As you can see from the previous image above, a default filter_width of 1.0f is set; this means (more like a symmetric kernel radius used in image processing and filtering) that two samples lies at the left of the pixel (0.5 + 0.5) and two samples on the right (0.5 + 0.5). Two samples also lies on the top and two also on the bottom. That explains the boxes you see in the above image centered on a pixel: you’re going to consider more samples to determine the color of a pixel.

marknv · October 21, 2013, 2:30pm

The code above calculates, based on these parameters, the lowest x,y coordinates of a box (given the filter_width) and the highest x,y coordinates for the same box. All for a given pixel. This code gets launched by a 504x504 grid of threads so each thread actually accesses 9x9 or 5x5 etc… values in the output_samples buffer grid.

Another thing to keep in mind is that pixels near the border (e.g. 0;0) are actually shifted by a 0,5;0,5 offset so they can have samples between their border and them. So for pixel 0;0, with all parameters set to default, the evaluate_filter() routine is going to be called with a 0.5f;0.5f sample parameter and the following sets of values for the first box around it: 4;4 4;3 4;2 4;1 4;0, 3;4 3;3 3;2 3;1 3;0, etc…

The evaluate_filter() routine checks that a grid position is valid for a given pixel and a given filter_width radius (i.e. it checks whether the sample is actually inside the box we’re considering) and returns a weight of 1.0f if it is, 0.0f otherwise.

// takes the sample and the pixel loc.
static __device__ float evaluate_filter( float2 sample, uint2 pi ) {
  // need the .5 because we want to consider, e.g., 
  //the (0,0) pixel to be (.5, .5) in continuous sample space.
  
  float dx = fabs(sample.x - (pi.x + .5f)); 
  float dy = fabs(sample.y - (pi.y + .5f));
  
  if (dx > filter_width || dy > filter_width ) return 0;
    
  switch( filter_type ) {
    case FILTER_BOX:
      return 1.f;
    ...

Ultimately this ‘weight’ is used to actually consider or not the sample’s value (black or white) in a running arithmetic mean

while (ur.x-- != ll.x) {
    ur.y = min((int)ceilf(sqrt_samples_per_pixel * (launch_index.y + .5f + filter_width)), sqrt_samples_per_pixel * window_size.y-1);

    while (ur.y-- != ll.y) {
      uint2 sample_index = make_uint2(ur.x,ur.y);
      float filt = evaluate_filter( make_float2(output_samples[sample_index].x,output_samples[sample_index].y), launch_index );
      num += filt * output_samples[sample_index].value;
      denom += filt;
    }
  }

  output_color_only[launch_index] = make_color(make_float3(num/denom));
}

You can play with the ‘w’ and ‘W’ keys in the sample to enlarge or narrow down the box dimensions and see how the checkerboard pixels get rendered by taking just a few samples against taking a huge amount of samples (don’t worry about box radiuses, all border-checking is performed at runtime to ensure your values are always bound to the correct sizes).

After you got a bit familiar with this one, you can then move forward with more advanced techniques/filter types.

maxli · October 22, 2013, 2:31pm

Thank you, marknv, you are really helpful!
Your explanation is so detail that I have learnt a lot from it, Now I want to give an example in sample6 of SDK examples, it is about the accum_camera.cu file.

RT_PROGRAM void pinhole_camera()
{
#ifdef TIME_VIEW
  clock_t t0 = clock(); 
#endif
  size_t2 screen = output_buffer.size();

  // Subpixel jitter: send the ray through a different position inside the pixel each time,
  // to provide antialiasing.
  unsigned int seed = rot_seed( rnd_seeds[ launch_index ], frame );
  float2 subpixel_jitter = make_float2(rnd( seed ) - 0.5f, rnd( seed ) - 0.5f) * jitter_factor;

  float2 d = (make_float2(launch_index) + subpixel_jitter) / make_float2(screen) * 2.f - 1.f;
  float3 ray_origin = eye;
  float3 ray_direction = normalize(d.x*U + d.y*V + W);
  
  optix::Ray ray(ray_origin, ray_direction, radiance_ray_type, scene_epsilon );

It is said that anti-aliasing algorithm can be reached by this way, but I can’t understand it, would you please interpret it for me, thank you.

mdkoa1 · October 23, 2013, 3:53am

unsigned int seed = rot_seed( rnd_seeds[ launch_index ], frame );
float2 subpixel_jitter = make_float2(rnd( seed ) - 0.5f, rnd( seed ) - 0.5f) * jitter_factor;

float2 d = (make_float2(launch_index) + subpixel_jitter) / make_float2(screen) * 2.f - 1.f;

Subpixel jitter adds a bit of randomless to the pixel you wish to sample. Instead of firing a ray from the center of the pixel, you add a little bit of random offset such that the ray is fired from a random point in the pixel.

Some details are mentioned in this slides: