optix __direct_callable_ program examples?

Leptus · October 12, 2019, 11:26am

I want to use direct_callable program in my application, but failed to find any examples. It seems that the direct callable program should be named with “direct_callable” prefix, and it should be a “global” function, with no return value. Is it right?

Leptus · October 12, 2019, 7:12pm

Forgot to say that my application is based on optix7.

yetsun · October 14, 2019, 12:55am

Same request.

I still confused with the use of direct/indirect callable. Is there any example about that? (OptiX 7.0)

droettger · October 14, 2019, 8:57am

I have an example showing that and I’m just waiting for a decision where to publish it.

Two of them are ports of this OptiX Introduction example from OptiX 5 to OptiX 7 and work just the same.
The renderer architecture is exactly the same.
https://github.com/nvpro-samples/optix_advanced_samples/tree/master/src/optixIntroduction/optixIntro_07

Here are some excerpts of that.

Let’s say you have sample() and eval() functions implemented for each basic BSDF.
This would be the simplest one, a specular reflection:

// ########## BRDF Specular (tinted mirror)

extern "C" __device__ void __direct_callable__sample_brdf_specular(MaterialDefinition const& material, State const& state, PerRayData* prd)
{
  prd->wi = reflect(-prd->wo, state.normal);

  if (dot(prd->wi, state.normalGeo) <= 0.0f) // Do not sample opaque materials below the geometric surface.
  {
    prd->flags |= FLAG_TERMINATE;
    return;
  }

  prd->f_over_pdf = state.albedo;
  prd->pdf        = 1.0f; // Not 0.0f to make sure the path is not terminated. Otherwise unused for specular events.
}

// This function will be used for all specular materials.
// This is actually never reached in this simple material system, because specular materials are never evaluated. Use as placeholder.
extern "C" __device__ float4 __direct_callable__eval_brdf_specular(MaterialDefinition const& material, State const& state, PerRayData const* prd, float3 const& wiL)
{
  return make_float4(0.0f);
}

Now assume that code is inside the PTX file name ./appname_core/bxdf_specular.ptx relative to the executable.
Then the host code to load it into an OptixModule, create program descriptions, program groups and the necessary shader binding table records is this:

...
  OptixModule moduleSpecular;
  ptx = readPTX("./appname_core/bxdf_specular.ptx");
  OPTIX_CHECK( m_api.optixModuleCreateFromPTX(m_optixContext, &mco, &pco, ptx.c_str(), ptx.size(), nullptr, nullptr, &moduleSpecular) );
  ...

  // Initialize all OptixProgramGroupDesc for this pipeline.
  std::vector<OptixProgramGroupDesc> programGroupDescriptions(NUM_PROGRAM_GROUP_IDS);
  memset(programGroupDescriptions.data(), 0, sizeof(OptixProgramGroupDesc) * programGroupDescriptions.size());

  // Here only showing the new direct callable part.
  OptixProgramGroupDesc* pgd;
  ...

  pgd = &programGroupDescriptions[PGID_BRDF_SPECULAR_SAMPLE]; // I use a fixed ID for each OptixProgramGroupDesc in the pipeline.
  pgd->kind  = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
  pgd->flags = OPTIX_PROGRAM_GROUP_FLAGS_NONE;
  pgd->callables.moduleDC            = moduleSpecular;
  pgd->callables.entryFunctionNameDC = "__direct_callable__sample_brdf_specular";

  pgd = &programGroupDescriptions[PGID_BRDF_SPECULAR_EVAL];
  pgd->kind  = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
  pgd->flags = OPTIX_PROGRAM_GROUP_FLAGS_NONE;
  pgd->callables.moduleDC            = moduleSpecular;
  pgd->callables.entryFunctionNameDC = "__direct_callable__eval_brdf_specular"; // black
  ...

  // Then create all program groups at once:

  OptixProgramGroupOptions pgo; // Just a placeholder so far.
  memset(&pgo, 0, sizeof(OptixProgramGroupOptions) );

  std::vector<OptixProgramGroup> programGroups(programGroupDescriptions.size());
  
  OPTIX_CHECK( m_api.optixProgramGroupCreate(m_optixContext, programGroupDescriptions.data(), (unsigned int) programGroupDescriptions.size(), &pgo, nullptr, nullptr, programGroups.data()) );

  // Then create your pipeline.
  ...
  // Then calculate and set its stack space.
  ...

  // Set up the fixed portion of the Shader Binding Table (SBT)

  // Put all SbtRecordHeader types in one CUdeviceptr.
  // SbtRecordHeader in that code is this struct:
  // All programs outside the hit groups do not have any per program data in this renderer.
  //struct SbtRecordHeader
  //{
  //  __align__(OPTIX_SBT_RECORD_ALIGNMENT) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
  //};

  const int numHeaders = LAST_DIRECT_CALLABLE_ID - PGID_RAYGENERATION + 1;

  std::vector<SbtRecordHeader> sbtRecordHeaders(numHeaders);

  for (int i = 0; i < numHeaders; ++i)
  {
    OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_RAYGENERATION + i], &sbtRecordHeaders[i]) );
  }

  CU_CHECK( cuMemAlloc(&m_d_sbtRecordHeaders, sizeof(SbtRecordHeader) * numHeaders) );
  CU_CHECK( cuMemcpyHtoDAsync(m_d_sbtRecordHeaders, sbtRecordHeaders.data(), sizeof(SbtRecordHeader) * numHeaders, m_cudaStream) );

  // Hit groups for radiance and shadow rays. These will be initialized later per instance.
  // This just provides the headers with the program group indices.

  // Note that the SBT record data field is uninitialized after these!
  // These are stored to be able to initialize the SBT hitGroup with the respective opaque and cutout shaders.
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_RADIANCE],        &m_sbtRecordHitRadiance) );
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_SHADOW],          &m_sbtRecordHitShadow) );
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_RADIANCE_CUTOUT], &m_sbtRecordHitRadianceCutout) );
  OPTIX_CHECK( m_api.optixSbtRecordPackHeader(programGroups[PGID_HIT_SHADOW_CUTOUT],   &m_sbtRecordHitShadowCutout) );

  // Setup the OptixShaderBindingTable.

  m_sbt.raygenRecord            = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * PGID_RAYGENERATION;

  m_sbt.exceptionRecord         = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * PGID_EXCEPTION;

  m_sbt.missRecordBase          = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * PGID_MISS_RADIANCE;
  m_sbt.missRecordStrideInBytes = (unsigned int) sizeof(SbtRecordHeader);
  m_sbt.missRecordCount         = NUM_RAYTYPES;

  // The hitgroupRecord is going to be setup after the render graph has been built.
  //m_sbt.hitgroupRecordBase          = reinterpret_cast<CUdeviceptr>(m_d_sbtRecordGeometryInstanceData);
  //m_sbt.hitgroupRecordStrideInBytes = (unsigned int) sizeof(SbtRecordGeometryInstanceData);
  //m_sbt.hitgroupRecordCount         = NUM_RAYTYPES * numInstances;

  // The direct callables device pointers, stride and count belong into the OptixShaderBindingTable callablesRecord* entries:
  // Note that this table of direct callable function pointers is similar to buffers of bindless callable program IDs in OptiX < 7, 
  // just that each of the direct callables in OptiX 7 can have its own function signature.
  m_sbt.callablesRecordBase          = m_d_sbtRecordHeaders + sizeof(SbtRecordHeader) * FIRST_DIRECT_CALLABLE_ID;
  m_sbt.callablesRecordStrideInBytes = (unsigned int) sizeof(SbtRecordHeader);
  m_sbt.callablesRecordCount         = LAST_DIRECT_CALLABLE_ID - FIRST_DIRECT_CALLABLE_ID + 1;

  // Destroy OptixModules at last.
  ...

Now inside the device code you can call any direct callable inside the SBT callablesRecord with the function optixDirectCall().

// Calculate the direct callable function index inside the shader binding table as offset from callablesRecordBase.
  // For example, this code assumes a number of lens shader samplers, then a number of light samplers, and then come the sample and eval function per BSDF., so the index to the BSDF sampling function is this: 
  const int indexBSDF = NUM_LENS_SHADERS + NUM_LIGHT_TYPES + material.indexBSDF * 2;

  // Call the BSDF sampling function with the required arguments.
  optixDirectCall<void, MaterialDefinition const&, State const&, PerRayData*>(indexBSDF, material, state, thePrd);

  ...
  
  // Similar for the BSDF eval function:

  // Returns BSDF f in .xyz and the BSDF pdf in .w
  // BSDF eval function is one index after the sample function.
  const float4 bsdf_pdf = optixDirectCall<float4, MaterialDefinition const&, State const&, PerRayData const*, float3 const&>(indexBSDF + 1, material, state, thePrd, lightSample.direction);
  ...

Hope that helps.

Leptus · October 14, 2019, 9:32am

Thanks, it works.It’s a good example to show how to use direct callable programs as a replacement of virtual functions in path tracing application. Can I know when will the examples be published?

benoit.ruiz · May 9, 2020, 7:04pm

Hi, thanks for this example. I tried to include it in my codebase but it fails when calling optixProgramGroupCreate(…). The reported error is:
COMPILE ERROR: "__direct_callable__texture" not found in programDescriptions[5].callables.moduleDC

I tried to investigate the problem, and the only thing that I could think of is that the ptx that I generated using nvcc does not contain the definition of my function, but only the ones specified as __global__.

Here is the command I used to generate the ptx (I omitted some include folders for clarity):

nvcc shading.cu -o shading.ptx --use_fast_math -gencode arch=compute_61,code=compute_61 -ptx -I /opt/optix-7/include/

The direct callable is defined as follows (inspired by your example):

extern "C" __device__ Color3f __direct_callable__texture(float f) {
    return Color3f(1.f, 0.f, f);
}

And the program group description is initialized like this:

program_group_descs[TEXTURE_DC].kind                          = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
program_group_descs[TEXTURE_DC].callables.moduleDC            = m_shading_module;
program_group_descs[TEXTURE_DC].callables.entryFunctionNameDC = "__direct_callable__texture";

I also tried to define the function as __global__, which worked but prevented me from returning any value from it or even return values through pointer arguments (which I would have been okay with). I really don’t know what I’m missing.
Any help would be appreciated!

m001 · May 10, 2020, 7:08pm

try to add
--relocatable-device-code=true
in the “nvcc” command line compile options

benoit.ruiz · May 10, 2020, 9:59pm

It worked! Thanks a lot! I tried reading the compile options for nvcc but didn’t really understand most of them…

droettger · May 11, 2020, 8:24am

Note that OptiX 7 applications mentioned above are published in the meantime.
Find the link here: https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410/4
Those are using direct callables extensively.

The --relocatable-device-code=true (-rdc) option is required since CUDA 8.0 because OptiX callable programs are functions which don’t appear as calls inside the PTX module and the CUDA compiler will optimize them away as dead code otherwise.

Topic		Replies	Views
Problems with callable functions OptiX	9	1677	October 12, 2021
InvalidAddressSpace when using pointer from Continuation Callable parameters OptiX	15	3170	September 27, 2021
Callable program buffer: Assertion failed OptiX	16	3288	June 14, 2022
Can optix 3.9.1 version work with cuda 9 or 10 OptiX	13	953	June 14, 2022
Porting APP from Optix 3.8 (32 bit) to Optix 6.5 (64 bit) : Need some help, please OptiX	54	2071	June 15, 2022
[Optix 7] How to debug sbt OptiX	13	1974	June 14, 2022
Exploring the sample optixTriangle program OptiX	4	1461	April 12, 2023
Questions about Programs of OptiX 7 OptiX	8	1329	June 14, 2022
Simple PTX shader - OptiX 7 OptiX	27	4161	October 12, 2021
Sharing code between OptiX 8 and the Application OptiX cuda , optix	7	78	October 11, 2024

optix __direct_callable_ program examples?

Related topics