optix __direct_callable_ program examples?

I want to use direct_callable program in my application, but failed to find any examples. It seems that the direct callable program should be named with “direct_callable” prefix, and it should be a “global” function, with no return value. Is it right?

Forgot to say that my application is based on optix7.

Same request.

I still confused with the use of direct/indirect callable. Is there any example about that? (OptiX 7.0)

I have an example showing that and I’m just waiting for a decision where to publish it.

Two of them are ports of this OptiX Introduction example from OptiX 5 to OptiX 7 and work just the same.
The renderer architecture is exactly the same.
https://github.com/nvpro-samples/optix_advanced_samples/tree/master/src/optixIntroduction/optixIntro_07

Here are some excerpts of that.

Let’s say you have sample() and eval() functions implemented for each basic BSDF.
This would be the simplest one, a specular reflection:

// ########## BRDF Specular (tinted mirror)

extern "C" __device__ void __direct_callable__sample_brdf_specular(MaterialDefinition const& material, State const& state, PerRayData* prd)
{
  prd->wi = reflect(-prd->wo, state.normal);

  if (dot(prd->wi, state.normalGeo) <= 0.0f) // Do not sample opaque materials below the geometric surface.
  {
    prd->flags |= FLAG_TERMINATE;
    return;
  }

  prd->f_over_pdf = state.albedo;
  prd->pdf        = 1.0f; // Not 0.0f to make sure the path is not terminated. Otherwise unused for specular events.
}

// This function will be used for all specular materials.
// This is actually never reached in this simple material system, because specular materials are never evaluated. Use as placeholder.
extern "C" __device__ float4 __direct_callable__eval_brdf_specular(MaterialDefinition const& material, State const& state, PerRayData const* prd, float3 const& wiL)
{
  return make_float4(0.0f);
}

Now assume that code is inside the PTX file name ./appname_core/bxdf_specular.ptx relative to the executable.
Then the host code to load it into an OptixModule, create program descriptions, program groups and the necessary shader binding table records is this:

...
  OptixModule moduleSpecular;
  ptx = readPTX("./appname_core/bxdf_specular.ptx");
  OPTIX_CHECK( m_api.optixModuleCreateFromPTX(m_optixContext, &mco, &pco, ptx.c_str(), ptx.size(), nullptr, nullptr, &moduleSpecular) );
  ...

  // Initialize all OptixProgramGroupDesc for this pipeline.
  std::vector<OptixProgramGroupDesc> programGroupDescriptions(NUM_PROGRAM_GROUP_IDS);
  memset(programGroupDescriptions.data(), 0, sizeof(OptixProgramGroupDesc) * programGroupDescriptions.size());

  // Here only showing the new direct callable part.
  OptixProgramGroupDesc* pgd;
  ...

  pgd = &programGroupDescriptions[PGID_BRDF_SPECULAR_SAMPLE]; // I use a fixed ID for each OptixProgramGroupDesc in the pipeline.
  pgd->kind  = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
  pgd->flags = OPTIX_PROGRAM_GROUP_FLAGS_NONE;
  pgd->callables.moduleDC            = moduleSpecular;
  pgd->callables.entryFunctionNameDC = "__direct_callable__sample_brdf_specular";

  pgd = &programGroupDescriptions[PGID_BRDF_SPECULAR_EVAL];
  pgd->kind  = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
  pgd->flags = OPTIX_PROGRAM_GROUP_FLAGS_NONE;
  pgd->callables.moduleDC            = moduleSpecular;
  pgd->callables.entryFunctionNameDC = "__direct_callable__eval_brdf_specular"; // black
  ...

  // Then create all program groups at once:

  OptixProgramGroupOptions pgo; // Just a placeholder so far.
  memset(&pgo, 0, sizeof(OptixProgramGroupOptions) );

  std::vector<OptixProgramGroup> programGroups(programGroupDescriptions.size());
  
  OPTIX_CHECK( m_api.optixProgramGroupCreate(m_optixContext, programGroupDescriptions.data(), (unsigned int) programGroupDescriptions.size(), &pgo, nullptr, nullptr, programGroups.data()) );

  // Then create your pipeline.
  ...
  // Then calculate and set its stack space.
  ...

  // Set up the fixed portion of the Shader Binding Table (SBT)

  // Put all SbtRecordHeader types in one CUdeviceptr.
  // SbtRecordHeader in that code is this struct:
  // All programs outside the hit groups do not have any per program data in this renderer.
  //struct SbtRecordHeader
  //{
  //  __align__(OPTIX_SBT_RECORD_ALIGNMENT) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
  //};

  const int numHeaders = LAST_DIRECT_CALLABLE_ID - PGID_RAYGENERATION + 1;

  std::vector<SbtRecordHeader> sbtRecordHeaders(numHeaders);

  for (int i = 0; i < numHeaders; ++i)
  {
    OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_RAYGENERATION + i], &sbtRecordHeaders[i]) );
  }

  CU_CHECK( cuMemAlloc(&m_d_sbtRecordHeaders, sizeof(SbtRecordHeader) * numHeaders) );
  CU_CHECK( cuMemcpyHtoDAsync(m_d_sbtRecordHeaders, sbtRecordHeaders.data(), sizeof(SbtRecordHeader) * numHeaders, m_cudaStream) );

  // Hit groups for radiance and shadow rays. These will be initialized later per instance.
  // This just provides the headers with the program group indices.

  // Note that the SBT record data field is uninitialized after these!
  // These are stored to be able to initialize the SBT hitGroup with the respective opaque and cutout shaders.
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_RADIANCE],        &m_sbtRecordHitRadiance) );
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_SHADOW],          &m_sbtRecordHitShadow) );
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_RADIANCE_CUTOUT], &m_sbtRecordHitRadianceCutout) );
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_SHADOW_CUTOUT],   &m_sbtRecordHitShadowCutout) );

  // Setup the OptixShaderBindingTable.

  m_sbt.raygenRecord            = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * PGID_RAYGENERATION;

  m_sbt.exceptionRecord         = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * PGID_EXCEPTION;

  m_sbt.missRecordBase          = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * PGID_MISS_RADIANCE;
  m_sbt.missRecordStrideInBytes = (unsigned int) sizeof(SbtRecordHeader);
  m_sbt.missRecordCount         = NUM_RAYTYPES;

  // The hitgroupRecord is going to be setup after the render graph has been built.
  //m_sbt.hitgroupRecordBase          = reinterpret_cast<CUdeviceptr>(m_d_sbtRecordGeometryInstanceData);
  //m_sbt.hitgroupRecordStrideInBytes = (unsigned int) sizeof(SbtRecordGeometryInstanceData);
  //m_sbt.hitgroupRecordCount         = NUM_RAYTYPES * numInstances;

  // The direct callables device pointers, stride and count belong into the OptixShaderBindingTable callablesRecord* entries:
  // Note that this table of direct callable function pointers is similar to buffers of bindless callable program IDs in OptiX < 7, 
  // just that each of the direct callables in OptiX 7 can have its own function signature.
  m_sbt.callablesRecordBase          = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * FIRST_DIRECT_CALLABLE_ID;
  m_sbt.callablesRecordStrideInBytes = (unsigned int) sizeof(SbtRecordHeader);
  m_sbt.callablesRecordCount         = LAST_DIRECT_CALLABLE_ID - FIRST_DIRECT_CALLABLE_ID + 1;

  // Destroy OptixModules at last.
  ...

Now inside the device code you can call any direct callable inside the SBT callablesRecord with the function optixDirectCall().

// Calculate the direct callable function index inside the shader binding table as offset from callablesRecordBase.
  // For example, this code assumes a number of lens shader samplers, then a number of light samplers, and then come the sample and eval function per BSDF., so the index to the BSDF sampling function is this: 
  const int indexBSDF = NUM_LENS_SHADERS + NUM_LIGHT_TYPES + material.indexBSDF * 2;

  // Call the BSDF sampling function with the required arguments.
  optixDirectCall<void, MaterialDefinition const&, State const&, PerRayData*>(indexBSDF, material, state, thePrd);

  ...
  
  // Similar for the BSDF eval function:

  // Returns BSDF f in .xyz and the BSDF pdf in .w
  // BSDF eval function is one index after the sample function.
  const float4 bsdf_pdf = optixDirectCall<float4, MaterialDefinition const&, State const&, PerRayData const*, float3 const&>(indexBSDF + 1, material, state, thePrd, lightSample.direction);
  ...

Hope that helps.

2 Likes

Thanks, it works.It’s a good example to show how to use direct callable programs as a replacement of virtual functions in path tracing application. Can I know when will the examples be published?

Hi, thanks for this example. I tried to include it in my codebase but it fails when calling optixProgramGroupCreate(…). The reported error is:
COMPILE ERROR: "__direct_callable__texture" not found in programDescriptions[5].callables.moduleDC

I tried to investigate the problem, and the only thing that I could think of is that the ptx that I generated using nvcc does not contain the definition of my function, but only the ones specified as __global__.

Here is the command I used to generate the ptx (I omitted some include folders for clarity):

nvcc shading.cu -o shading.ptx --use_fast_math -gencode arch=compute_61,code=compute_61 -ptx -I /opt/optix-7/include/ 

The direct callable is defined as follows (inspired by your example):

extern "C" __device__ Color3f __direct_callable__texture(float f) {
    return Color3f(1.f, 0.f, f);
}

And the program group description is initialized like this:

program_group_descs[TEXTURE_DC].kind                          = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
program_group_descs[TEXTURE_DC].callables.moduleDC            = m_shading_module;
program_group_descs[TEXTURE_DC].callables.entryFunctionNameDC = "__direct_callable__texture";

I also tried to define the function as __global__, which worked but prevented me from returning any value from it or even return values through pointer arguments (which I would have been okay with). I really don’t know what I’m missing.
Any help would be appreciated!

try to add
--relocatable-device-code=true
in the “nvcc” command line compile options

It worked! Thanks a lot! I tried reading the compile options for nvcc but didn’t really understand most of them…

Note that OptiX 7 applications mentioned above are published in the meantime.
Find the link here: https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410/4
Those are using direct callables extensively.

The --relocatable-device-code=true (-rdc) option is required since CUDA 8.0 because OptiX callable programs are functions which don’t appear as calls inside the PTX module and the CUDA compiler will optimize them away as dead code otherwise.