[resolved] rtTrace from bindless callable programs

I’m trying to implement my material shaders in bindless callable programs per Detlef’s suggestion. According to the OptiX 6.0.0 release notes, it now supports “rtTrace from bindless callable programs”, which you couldn’t do in the past. However, I haven’t found any documentation or examples of how to do this.

Currently, as long as I don’t call rtTrace in my bindless program, everything works fine. I can make ray and ray payload data structures, pass attributes by value, etc. However, as soon as I put rtTrace into the program, I get strange behavior:

  • On GTX 1080 Ti, it seems that the entire bindless program is not called (and I can't even force an exception by putting rtThrow into the callable program). The rest of the code runs normally, but the value returned by the callable program is garbage.
  • On RTX 8000, I get the following error: ``` Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuEventSynchronize( m_event ) returned (700): Illegal address) ```
  • Here’s the closest hit code:

    RT_PROGRAM void closest_hit_generic() // This is my new closest hit program
    {
    	if (mat_id >= material_data.size()) return;
    	MaterialData mat_data = material_data[mat_id];
    
    	prd.result = ((rtCallableProgramId<float3()>)mat_data.radiance_program_id)();
    }
    
    RT_CALLABLE_PROGRAM float3 closest_hit_radiance() // This used to be my closest hit program, now it is callable and stripped down to a minimal example
    {
    	PerRayData_radiance new_prd;
    	new_prd.result = make_float3(0);
    	Ray refl_ray = make_Ray(make_float3(1, 0, 10), make_float3(0, 0, 1), RADIANCE_RAY, 1e-4, RT_DEFAULT_MAX);
    	rtTrace(top_object, refl_ray, new_prd); // If you comment this line, it "works"
    	return new_prd.result;
    }
    

    The environment is RTX 8000, Windows driver 418.81, nvcc version 10.0.130

    I’ll send a stack trace to the help email.

    Yes, this not going to work like that without additional changes and there is no example inside the OptiX SDK, yet.

    OptiX cannot always detect automatically if bindless callable programs along a hierarchy of calls contain an rtTrace call which would need additional internal instrumentation to be able to call rtTrace.

    For that OptiX added a call site instrumentation which allows to tell OptiX which bindless callable program IDs are potentially calling which others.

    This works automatically when holding a bindless callable program ID directly in an rtDeclareVariable.
    It also works automatically when using buffers of bindless callable program IDs.
    All other cases need additional call site instrumentation on device side and some host side configuration.

    First, you need to use rtMarkedCallableProgramId instead of rtCallableProgramId when calling a bindless callable program ID with an rtTrace inside.

    Please look into the optix_device.h headers for more information on rtMarkedCallableProgramId.

    That rtMarkedCallableProgramId allows to define a call site via a constant string which can be used on the host side inside the newly added function rtProgramCallsiteSetPotentialCallees which allows to specify which bindless callable program IDs are potentially being called from specific rtMarkedCallableProgramId locations inside the device code.
    This allows OptiX to instrument the hierarchy of calls with the necessary information to be able to call an rtTrace.

    So your code should look something like this:

    RT_PROGRAM void closest_hit_generic()
    {
    	if (mat_id >= material_data.size()) return;
    	MaterialData mat_data = material_data[mat_id];
    
    	prd.result = ((rtMarkedCallableProgramId<float3()>)mat_data.radiance_program_id, "my_call_site")();
    }
    
    RT_CALLABLE_PROGRAM float3 closest_hit_radiance()
    {
    	PerRayData_radiance new_prd;
    	new_prd.result = make_float3(0);
    	Ray refl_ray = make_Ray(make_float3(1, 0, 10), make_float3(0, 0, 1), RADIANCE_RAY, 1e-4, RT_DEFAULT_MAX);
    	rtTrace(top_object, refl_ray, new_prd);
    	return new_prd.result;
    }
    
    // On the host:
    Program ch_generic  = context->createProgramFromPTXString(ptx, "closest_hit_generic");
    Program cp_radiance = context->createProgramFromPTXString(ptx, "closest_hit_radiance");
    
    // Gather all bindless callable program IDs which can be called from "my_call_site":
    std::vector<int> callees;
    callees.push_back(cp_radiance->getId());
    
    // Let OptiX know that these bindless callable program IDs can potentially be called from "my_call_site" inside the closest_hit_generic program object:
    ch_generic->setCallsitePotentialCallees("my_call_site", callees);
    

    That said, I would not use that mechanism when I can avoid it.
    If you can make the bindless callable programs only calculate information which can be used after the return inside the closest hit program to do the necessary rtTrace with these information, that would speed up the bindless callable programs. My OptiX introduction examples do it this way.

    Thank you for the quick reply, Detlef.

    I now get a compiler error:

    error : no suitable constructor exists to convert from "int" to "optix::markedCallableProgramId<float3 ()>"
    

    This occurs on the line:

    prd.result = ((rtMarkedCallableProgramId<float3()>)mat_data.radiance_program_id, "my_call_site")();
    

    Sorry, wrong brackets.

    rtMarkedCallableProgramId<float3()>(mat_data.radiance_program_id, "my_call_site")();
    

    I’ve marked this resolved, but I’ll add a note as I found this rather tricky. It’s important that the method signature be an exact match, including all const qualifiers. Otherwise, the cudaDriver().CuEventSynchronize( m_event ) error may occur at any place where the code branches (usually at if statements or rtTrace calls).

    In my working solution, I have the following:

    RT_PROGRAM void closest_hit_generic()
    {
    	if (mat_id >= material_data.size()) return;
    	MaterialData mat_data = material_data[mat_id];
    
    	prd = rtMarkedCallableProgramId<PerRayData_radiance(MaterialData const&, PerRayData_radiance)>(mat_data.radiance_program_id, "my_call_site")(mat_data, prd);
    }
    
    RT_CALLABLE_PROGRAM PerRayData_radiance closest_hit_radiance(MaterialData const&mat_data, PerRayData_radiance prd)
    {
    	// The material intersection routines, including rtTrace calls go here ...
    	return prd;
    }
    
    
    // On the host:
    Program ch_generic  = context->createProgramFromPTXString(ptx, "closest_hit_generic");
    Program cp_radiance = context->createProgramFromPTXString(ptx, "closest_hit_radiance");
    
    // Let OptiX know that these bindless callable program IDs can potentially be called from "my_call_site" inside the closest_hit_generic program object:
    ch_generic->setCallsitePotentialCallees("my_call_site", callees);
    

    That would be always required for all bindless callable program signatures.

    Really, I still recommend to avoid this functionality if you can. It’s meant for special material system implementations and should only be used if there is no other implementation possible.
    This functionality doesn’t come for free inside the compilation step and at runtime.