Optix 7 equivalent of `rtProgramCreateFromPTXFiles`

I have an application where the raygen, closesthit, etc functions are written in a .cu file, and compiled to PTX.
However, they rely on some additional functions whose PTX source only becomes available shortly before creating the OptiX modules themselves.

In OptiX 6, I was able to simply forward-declare these extra functions, and pass all the PTX strings to rtProgramCreateFromPTXFiles and everything would get resolved nicely.

How to achieve the same with OptiX 7?
In the end, all definitions are available at link time, so inlining and so on should be possible when building the pipeline (i.e. I shouldn’t need to go through callables).

Are you saying that the same forward declarations inside the PTX code do not work?
Could you give code examples of a function definition, its forward declaration and usage which fails and the exact error messages.?
Maybe also the code which handles the OptixModule and OptixPipeline creation.

Please add the usual system configuration information:
OS version, installed GPU(s), display driver version (this is crucial!), CUDA toolkit used to compile the input PTX, host compiler version.

Hi @droettger,

Thanks for your answer, here are more details about my problem.

  • OS version: Ubuntu 20.04
  • GPUs: 2x 2080 Ti (but only one visible with CUDA_VISIBLE_DEVICES for now)
  • Display driver: 450.36.06
  • CUDA version: 11.0
  • Output of nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_May__6_19:09:25_PDT_2020
Cuda compilation tools, release 11.0, V11.0.167
Build cuda_11.0_bu.TC445_37.28358933_0
  • Host compiler: not entirely sure which one gets picked up, but I’m guessing GCC 9.3.0

I have two sources of code:

(a): a .cu source which gets compiled to PTX with nvcc in advance, embedded as a C-style string into my host progam, and passed to optixModuleCreateFromPTX as a string:

#include <optix.h>

// Declaration of the JIT-compiled function (definition not available yet)
extern "C" __device__ void my_jit_function(float in, float *out);

extern "C" __global__ void __raygen__rg() {
    float in = 4.f;
    float out;
    my_jit_function(in, &out);
    printf("Result: %f\n", out);

extern "C" __global__ void __closesthit__depth() {}

(b): an additional function coming from a JIT-compiler. It’s directly emitted as PTX:

.visible .func my_jit_function(
    .param.b32 in_10,        // a
    .param.b64 out_12        // b
) {
    .reg.b8 %b<13>;
    .reg.b16 %w<13>;
    .reg.b32 %r<13>;
    .reg.b64 %rd<13>;
    .reg.f32 %f<13>;
    .reg.f64 %d<13>;
    .reg.pred %p<13>;

    // Load register %f10: a
    ld.param.f32 %f10, [in_10];

    mov.f32 %f11, 0f40000000;

    // Compute register %f12: b
    add.rn.ftz.f32 %f12, %f10, %f11;

    // Store register %f12: b
    ld.param.u64 %rd8, [out_12];
    st.global.f32 [%rd8], %f12;


The module creation code is mostly taken from the Optix 7 course examples:

m_module_compile_options.maxRegisterCount  = 50;
m_module_compile_options.optLevel          = OPTIX_COMPILE_OPTIMIZATION_DEFAULT;
m_module_compile_options.debugLevel        = OPTIX_COMPILE_DEBUG_LEVEL_NONE;

m_pipeline_compile_options = {};
m_pipeline_compile_options.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS;
m_pipeline_compile_options.usesMotionBlur     = false;
m_pipeline_compile_options.numPayloadValues   = 2;
m_pipeline_compile_options.numAttributeValues = 2;
// m_pipeline_compile_options.exceptionFlags     = OPTIX_EXCEPTION_FLAG_NONE;
m_pipeline_compile_options.exceptionFlags     = OPTIX_EXCEPTION_FLAG_DEBUG;
m_pipeline_compile_options.pipelineLaunchParamsVariableName = "launch_params";

m_pipeline_link_options.overrideUsesMotionBlur = false;
m_pipeline_link_options.maxTraceDepth = 2;

// --- Module 1
m_program_ptx = (const char *) my_precompiled_ptx;

char log[2048];
size_t sizeof_log = sizeof(log);
    m_optix_context, &m_module_compile_options, &m_pipeline_compile_options,
    m_program_ptx.c_str(), m_program_ptx.size(), log, &sizeof_log,
if (sizeof_log > 1) Log(Info, log);

In Optix 6, I with this exact setup, I simply took my two PTX strings and passed them to rtProgramCreateFromPTXFiles. It worked without any issues.

In OptiX 7, I’ve now tried two things:

(1): appending the JIT-compiled PTX to the pre-compiled PTX:

m_program_ptx = (const char *) my_precompiled_ptx;
m_program_ptx += the_jit_ptx_string;


[ 2][COMPILE FEEDBACK]: COMPILE ERROR: Invalid PTX input: ptx2llvm-module-001: error: Failed to parse input PTX string
ptx2llvm-module-001, line 9; warning : Unsupported .version 7.0; current version is '6.4'
ptx2llvm-module-001, line 118; error   : Inconsistent redefinition of variable 'my_jit_function'
ptx2llvm-module-001, line 141; error   : Function definition conflicts with '.extern' declaration for function 'my_jit_function'
Cannot parse input PTX string

Which makes sense, I think.

(2): adding my second PTX as another module (optixModuleCreateFromPTX). That function really wants the PTX to contain a special OptiX function (“COMPILE ERROR: No functions with semantic types found”). But even if I put a dummy one in there so that it gets accepted and I can create a module, I don’t see how to link that module together with the main module that I care about.

Option (1) would be my favorite, since in practice I don’t need real linking, just making a definition available. It’s entirely possible I’m missing something obvious.

Thanks a lot in advance for your help.

Note: I’ve also tried using a declaration without extern "C" (which I was mostly using to avoid the name mangling):

__device__ void my_jit_function(float in, float *out);

Which gets compiled to (by nvcc):

.extern .func _Z15my_jit_functionfPf
	.param .b32 _Z15my_jit_functionfPf_param_0,
	.param .b64 _Z15my_jit_functionfPf_param_1

Updated the JIT-compiled function to:

.visible .func _Z15my_jit_functionfPf(
    .param.b32 in_10,        // a
    .param.b64 out_12        // b
) {

The error is more or less the same:

[ 2][COMPILE FEEDBACK]: COMPILE ERROR: Invalid PTX input: ptx2llvm-module-001: error: Failed to parse input PTX string
ptx2llvm-module-001, line 9; warning : Unsupported .version 7.0; current version is '6.4'
ptx2llvm-module-001, line 131; error   : Inconsistent redefinition of variable '_Z15my_jit_functionfPf'
ptx2llvm-module-001, line 154; error   : Function definition conflicts with '.extern' declaration for function '_Z15my_jit_functionfPf'
Cannot parse input PTX string

rt_check(): OptiX API error = 7200 (Invalid PTX input) in ../src/optix/rb_renderer.cpp:182.

Someone told me that solution (2) (creating a second module, adding a dummy raygen or callable, creating a dummy program group, and finally passing the whole thing to optixPipelineCreate) should work.

I’ll give that a try and update this thread.

Note that this: ptx2llvm-module-001, line 9; warning : Unsupported .version 7.0; current version is '6.4'
means you shouldn’t use CUDA 11, but CUDA 10 versions.

Well, it’s just a warning, but it’s recommended to read the OptiX Release Notes about supported software versions before setting up a system for OptiX development.

1 Like

Also this is bad for performance: m_module_compile_options.maxRegisterCount = 50;

That should always be set to this define to let OptiX decide on the best value for the underlying hardware:
/// Maximum number of registers allowed. Defaults to no explicit limit.

See OptiX SDK 7.0.0\include\optix_7_types.h

1 Like

With these tips, I got a working solution to replace rtProgramCreateFromPTXFiles in OptiX 7:

  1. Create one module per PTX source (optixModuleCreateFromPTX). Each module needs to contain at least one “semantic” function that OptiX recognizes, but it doesn’t have to do anything. I just append the following to my PTX source:
.visible .entry __direct_callable__dummy__some_unique_suffix() {ret;}
  1. Create one program group for each additional module. In my case, because I declared a dummy callable, I created direct callable program groups (OPTIX_PROGRAM_GROUP_KIND_CALLABLES) that reference each module and their dummy callable function.
    This is just so that the module is referenced somewhere, we don’t actually use these callables.

  2. Finally, when creating the pipeline (optixPipelineCreate), pass the full list of program groups. All modules will get linked together.

One important thing to be aware of is that unlike with OptiX 6, linking now happens all the way at the end, with all modules at once. So you have to be careful not to define two functions with the same name.