Hi @droettger,
Thanks for your answer, here are more details about my problem.
- OS version: Ubuntu 20.04
- GPUs: 2x 2080 Ti (but only one visible with
CUDA_VISIBLE_DEVICES
for now)
- Display driver: 450.36.06
- CUDA version: 11.0
- Output of
nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_May__6_19:09:25_PDT_2020
Cuda compilation tools, release 11.0, V11.0.167
Build cuda_11.0_bu.TC445_37.28358933_0
- Host compiler: not entirely sure which one gets picked up, but I’m guessing GCC 9.3.0
I have two sources of code:
(a): a .cu
source which gets compiled to PTX with nvcc
in advance, embedded as a C-style string into my host progam, and passed to optixModuleCreateFromPTX
as a string:
#include <optix.h>
// Declaration of the JIT-compiled function (definition not available yet)
extern "C" __device__ void my_jit_function(float in, float *out);
extern "C" __global__ void __raygen__rg() {
float in = 4.f;
float out;
my_jit_function(in, &out);
printf("Result: %f\n", out);
}
extern "C" __global__ void __closesthit__depth() {}
(b): an additional function coming from a JIT-compiler. It’s directly emitted as PTX:
.visible .func my_jit_function(
.param.b32 in_10, // a
.param.b64 out_12 // b
) {
.reg.b8 %b<13>;
.reg.b16 %w<13>;
.reg.b32 %r<13>;
.reg.b64 %rd<13>;
.reg.f32 %f<13>;
.reg.f64 %d<13>;
.reg.pred %p<13>;
// Load register %f10: a
ld.param.f32 %f10, [in_10];
mov.f32 %f11, 0f40000000;
// Compute register %f12: b
add.rn.ftz.f32 %f12, %f10, %f11;
// Store register %f12: b
ld.param.u64 %rd8, [out_12];
st.global.f32 [%rd8], %f12;
L0:
ret;
}
The module creation code is mostly taken from the Optix 7 course examples:
m_module_compile_options.maxRegisterCount = 50;
m_module_compile_options.optLevel = OPTIX_COMPILE_OPTIMIZATION_DEFAULT;
m_module_compile_options.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_NONE;
m_pipeline_compile_options = {};
m_pipeline_compile_options.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS;
m_pipeline_compile_options.usesMotionBlur = false;
m_pipeline_compile_options.numPayloadValues = 2;
m_pipeline_compile_options.numAttributeValues = 2;
// m_pipeline_compile_options.exceptionFlags = OPTIX_EXCEPTION_FLAG_NONE;
m_pipeline_compile_options.exceptionFlags = OPTIX_EXCEPTION_FLAG_DEBUG;
m_pipeline_compile_options.pipelineLaunchParamsVariableName = "launch_params";
m_pipeline_link_options.overrideUsesMotionBlur = false;
m_pipeline_link_options.maxTraceDepth = 2;
// --- Module 1
m_program_ptx = (const char *) my_precompiled_ptx;
char log[2048];
size_t sizeof_log = sizeof(log);
rt_check(optixModuleCreateFromPTX(
m_optix_context, &m_module_compile_options, &m_pipeline_compile_options,
m_program_ptx.c_str(), m_program_ptx.size(), log, &sizeof_log,
&m_modules[0]));
if (sizeof_log > 1) Log(Info, log);
In Optix 6, I with this exact setup, I simply took my two PTX strings and passed them to rtProgramCreateFromPTXFiles
. It worked without any issues.
In OptiX 7, I’ve now tried two things:
(1): appending the JIT-compiled PTX to the pre-compiled PTX:
m_program_ptx = (const char *) my_precompiled_ptx;
m_program_ptx += the_jit_ptx_string;
Error:
[ 2][COMPILE FEEDBACK]: COMPILE ERROR: Invalid PTX input: ptx2llvm-module-001: error: Failed to parse input PTX string
ptx2llvm-module-001, line 9; warning : Unsupported .version 7.0; current version is '6.4'
ptx2llvm-module-001, line 118; error : Inconsistent redefinition of variable 'my_jit_function'
ptx2llvm-module-001, line 141; error : Function definition conflicts with '.extern' declaration for function 'my_jit_function'
Cannot parse input PTX string
Which makes sense, I think.
(2): adding my second PTX as another module (optixModuleCreateFromPTX
). That function really wants the PTX to contain a special OptiX function (“COMPILE ERROR: No functions with semantic types found”). But even if I put a dummy one in there so that it gets accepted and I can create a module, I don’t see how to link that module together with the main module that I care about.
Option (1) would be my favorite, since in practice I don’t need real linking, just making a definition available. It’s entirely possible I’m missing something obvious.
Thanks a lot in advance for your help.