Problems with callable functions

Using OptiX 7.1 with CUDA 10.2, driver 461.09 with GTX 1050Ti
When I run the optixCallablePrograms from the samples that came with SDK in debug mode, it crashes

Caught exception: CUDA error on synchronize with error ‘an illegal memory access was encountered’ (D:\ProgramData\NVIDIA Corporation\OptiX SDK 7.1.0\SDK\optixCallablePrograms\optixCallablePrograms.cpp:703)

But works fine in release mode.

When I run my sample, it get

[ 2][COMPILE FEEDBACK]: COMPILE ERROR: “__direct_callable__scatterMetallic” not found in programDescriptions[0].callables.moduleDC

Optix call (optixProgramGroupCreate( optixContext, &pgDesc, 1, &pgOptions, log, &sizeofLog, &callablePGs[MetallicMaterial])) failed with code 7001 (line 525)

The module is the same as the one used in raygen, miss, hitgroup.

void SampleRenderer::createCallablePrograms() {
		callablePGs.resize(MaterialTypeCount);
		{
			OptixProgramGroupOptions pgOptions{};
			OptixProgramGroupDesc pgDesc{};
			pgDesc.kind = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
			pgDesc.callables.moduleDC = module;
			pgDesc.callables.entryFunctionNameDC = "__direct_callable__scatterMetallic";

			char log[2048];
			size_t sizeofLog = sizeof(log);
			OPTIX_CHECK(optixProgramGroupCreate(
				optixContext,
				&pgDesc,
				1,
				&pgOptions,
				log,
				&sizeofLog,
				&callablePGs[MetallicMaterial]));
			if (sizeofLog > 1) PRINT(log);
		}
		{
			OptixProgramGroupOptions pgOptions{};
			OptixProgramGroupDesc pgDesc{};
			pgDesc.kind = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
			pgDesc.callables.moduleDC = module;
			pgDesc.callables.entryFunctionNameDC = "__direct_callable__scatterLambertian";

			char log[2048];
			size_t sizeofLog = sizeof(log);
			OPTIX_CHECK(optixProgramGroupCreate(
				optixContext,
				&pgDesc,
				1,
				&pgOptions,
				log,
				&sizeofLog,
				&callablePGs[LambertianMaterial]));
			if (sizeofLog > 1) PRINT(log);
		}
	}

These are the callable and ordinary kernels/shaders from the .cu file:

static __forceinline__ __device__ bool scatterMetallic(const Ray& inputRay, vec3f& attenuation, Ray& scatteredRay, gdt::LCG<16>& prng, RadiancePRD* prd) {
	vec3f reflected = reflect(inputRay.direction, prd->hitNormal);
	scatteredRay.origin = prd->hitPoint;
	scatteredRay.direction = reflected + prd->material.data.metallic.fuzz * randomInUnitSphere(prng);
	attenuation = prd->material.data.metallic.albedo;
	//Is the angle between scattered ray and the surface normal acute?
	return(0.0f < dot(scatteredRay.direction, prd->hitNormal));
}

static __forceinline__ __device__ bool scatterLambertian(const Ray& inputRay, vec3f& attenuation, Ray& scatteredRay, gdt::LCG<16>& prng, RadiancePRD* prd) {
	return true;
}

extern "C" __device__ bool __direct_callable__scatterMetallic(const Ray& inputRay, vec3f& attenuation, Ray& scatteredRay, gdt::LCG<16>& prng, RadiancePRD* prd) {
	vec3f reflected = reflect(inputRay.direction, prd->hitNormal);
	scatteredRay.origin = prd->hitPoint;
	scatteredRay.direction = reflected + prd->material.data.metallic.fuzz * randomInUnitSphere(prng);
	attenuation = prd->material.data.metallic.albedo;
	//Is the angle between scattered ray and the surface normal acute?
	return(0.0f < dot(scatteredRay.direction, prd->hitNormal));
}

extern "C" __device__ bool __direct_callable__scatterLambertian(const Ray & inputRay, vec3f & attenuation, Ray & scatteredRay, gdt::LCG<16> & prng, RadiancePRD * prd) {
	return true;
}

For completeness, this is how I invoke (tried both ways-but it doesn’t come this far; fails at compilation for some reason):
__direct_callable__scatterMetallic(Ray{ rayOrigin , rayDirection }, attenuation, scattered, prng, &perRayData)

optixDirectCall<bool, const Ray&, vec3f&, Ray&, gdt::LCG<16>&, RadiancePRD*>(0u, Ray{ rayOrigin , rayDirection }, attenuation, scattered, prng, &perRayData)

Just for completeness of the system environment, what is your host compiler version?

1.)
With 461.09 installed, you could also use OptiX SDK 7.2.0.
The optixCallablePrograms example contains a few code changes between these OptiX versions
One looks to be like a missing variable initialization of dc_index which could very well be the reason for this release vs. debug behavior you see.
Please let us know if that solves this issue.

2.)
For the second issue, you’re saying you have your own OptiX application implementing that callable program named
__direct_callable__scatterMetallic and optixProgramGroupCreate() already fails because it cannot find that direct callable name inside the given module?
Did the __direct_callable__scatterLambertian also fail?

Did you look into the *.ptx source code to check if it’s actually in there?
If it’s not, you’re most likely missing an NVCC compiler option required since CUDA 8.0 which prevents dead code elimination of functions which aren’t called inside the module.
Check this post: https://forums.developer.nvidia.com/t/comparing-optix-performance-to-cuda/74721/20

I’m using direct callables in most of my OptiX applications.
Note that you can create multiple OptixProgramGroups at once.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp#L792

1 Like
  1. Stepped through the code in the debugger to verify that dc_index is initialized to 0 and stays at 0 until crash.
  2. __direct_callable__scatterMetallic and __direct_callable__scatterLambertian were not in the *.ptx file.
    After using --keep-device-functions or --relocatable-device-code=true they show up in the *.ptx and I can invoke them explicitly.
    I am still having a problem of invoking them through optixDirectCall<>() which results in

an illegal memory access was encountered

Since I was able to use these functions explicitly, I am thinking it may have to do with configuration.

optixPipelineSetStackSize(
			pipeline,	// [in] The pipeline to configure the stack size for
			2 * 1024,	// [in] The direct stack size requirement for direct callables invoked from IS or AH
			2 * 1024,	// [in] The direct stack size requirement for direct callables invoked from RG, MS, or CH
			2 * 1024,	// [in] The continuation stack requirement
			2));		// [in] The maximum depth of a traversable graph passed to trace

What else should I check?

There were more changes in the OptiX SDK 7.2.0 version of the optixCallablePrograms example than that dc_index.
Please try if updating to the OptiX SDK 7.2.0 solves that issue.

You shouldn’t hardcode an OptiX stack size like that. That’s either wrong or can waste a lot of VRAM when too big.
This is esp. important when running on such low-end boards.
I always recommend calculating the OptiX pipeline’s stack size explicitly and to the minimum necessary size.
Also because the built-in stack size calculation doesn’t handle all cases, esp. not when using OptiX callable programs.
See second paragraph here:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#pipeline-stack-size

The OptiX SDK contains some helper functions for the most common cases.
Please have a look into the optix_stack_size.h header.

I wrote my stack size calculation code before these helpers existed and this is how it looks for a single level AS hierarchy (IAS->GAS) for a path tracer with one recursive ray (shadow), direct callable programs and no continuation callable programs:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/Device.cpp#L804

I would generally recommend updating to OptiX 7.2.0 if you’re on a new enough display driver anyway.

EDIT: Some more comments:

I am still having a problem of invoking them through optixDirectCall<>() which results in
“an illegal memory access was encountered)”

The signature of your function declaration and the optixDirectCall match
If the direct callable program record at the index 0 you’re calling contains one of these direct callables in the shader binding table direct callables’ record, I don’t see why this shouldn’t work with the given information.
If you put it there, check if the shader binding table record lies at the required address alignment.

This example code contains an SBT setup with direct callable programs. Maybe compare that to your code.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp#L880
Walk up from there to see how the OptiX pipeline was built.

Note that using pointers or references in the function callable arguments will incur memory accesses.
If it’s possible to do some arguments as call-by-value to improve performance.

I normally avoid bool types in OptiX structures and non-inline functions like callable programs.
I also only use native CUDA types instead of other vector classes. From the look of the gdt vector definitions, I do not see how the they would translate to the most efficient CUDA code for vectorized loads and saves because the necessary CUDA alignments are missing. Means vec2f and vec4f will be handled as individual floats which is slower.

1 Like

Thank you for your helpful suggestions.

  1. After upgrading to 7.2.0, the optixCallablePrograms from the samples runs in Debug and Release modes.
  2. In my example, there was some error in the way I built my SBT, now it does not crash with

an illegal memory access was encountered

but I am not getting the correct result as compared to when I invoke the shaders explicitly.


Setting the stack size after pipeline creation :

        OptixStackSizes stackSizes{};
		for(auto& programGroup: programGroups)
			OPTIX_CHECK(optixUtilAccumulateStackSizes(programGroup, &stackSizes));

		const uint32_t max_trace_depth = 1;
		const uint32_t max_cc_depth = 1;
		const uint32_t max_dc_depth = 1;
		const uint32_t max_traversal_depth = 2;
		uint32_t direct_callable_stack_size_from_traversal;
		uint32_t direct_callable_stack_size_from_state;
		uint32_t continuation_stack_size;
		OPTIX_CHECK(optixUtilComputeStackSizes(&stackSizes, max_trace_depth, max_cc_depth, max_dc_depth, &direct_callable_stack_size_from_traversal,
			&direct_callable_stack_size_from_state, &continuation_stack_size));

		
		OPTIX_CHECK(optixPipelineSetStackSize(pipeline, 
			direct_callable_stack_size_from_traversal,
			direct_callable_stack_size_from_state,
			continuation_stack_size, 
			max_traversal_depth));

SBT Record for Callables:

        //Callables record
		std::vector<CallableRecord> callableRecords;
		{
			for (uint32_t i = 0u; i < callablePGs.size(); i++) {
				CallableRecord rec;
				OPTIX_CHECK(optixSbtRecordPackHeader(callablePGs[i], &rec));
				callableRecords.push_back(rec);
			}
			callableRecordsBuffer.alloc_and_upload(callableRecords);
			sbt.callablesRecordBase = missRecordsBuffer.d_pointer();
			sbt.callablesRecordStrideInBytes = static_cast<uint32_t>(sizeof(CallableRecord));
			sbt.callablesRecordCount = static_cast<uint32_t>(callableRecords.size());
		}

All hits are broken. That should be easy to find.

Do you have an exception program running?
If not, what happens when implementing one and enabling all exceptions inside the OptixPipelineCompileOptions?
Example code:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/exception.cu
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp#L637

Is the max_trace_depth really only one? (Means this is a brute force path tracer without shadow rays.)
If not, all hits should throw a stack overflow exception.

Did you initialize all members inside your RadiancePRD? (Most likely when the explicit calls worked.)

From where are you calling the direct callable program? I wonder about the &perRayData argument in the call.
If your per-ray payload is a structure inside the ray generation program and you pass its 64-bit pointer split to two unsigned int arguments in the optixTrace() code like I do it in my examples, then calling direct callable programs from the closest hit program should use the merged 64-bit pointer again which would not need the & operator to get the address.
Example here: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/closesthit.cu#L248
Note that thePrd is already the 64-bit pointer to the per-ray payload defined inside the ray generation program.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/closesthit.cu#L165

for (uint32_t i = 0u; i < callablePGs.size(); i++) {
This should generate a compiler warning at the proper warning level. The standard library container size() function returns a size_t type. Either use size_t or an explicit cast.

1 Like

The Hit shader reports the material type (among other things) back to the Raygen shader, via PerRayData. The Raygen shader, in turn, invokes material specific DirectCallables: If I use if-else if … and invoke the Direct Callables manually they work, when I use the optixDirectCall() with the very same index that I used in the if-else if… it does not work.
Instrumented the app with exception reporting and got this, when optixDirectCall() is used:

DBG: 0 : sbtIndex
DBG: 0 : sbtIndex
DBG: 0 : sbtIndex
DBG: 0 : sbtIndex
DBG: 0 : sbtIndex
DBG: 0 : sbtIndex
DBG: 0.000000E+00xception 75sbtIndex

Exception -13 at (249, 11)
Exception -13 at (234, 35)
Exception -13 at (235, 35)
Exception -13 at (232, 36)
Exception -13 at (233, 36)
Exception -13 at (234, 36)
Exception -13 at (263, 36)
Exception -13 at (260, 37)
Exception -13 at (261, 37)
Exception -13 at (262, 37)
Exception -13 at (263, 37)
Exception -13 at (260, 38)
Exception -13 at (318, 16)
Exception -13 at (319, 16)
Exception -13 at (316, 17)
Exception -13 at (317, 17)
Exception -13 at (302, 49)
Exception -13 at (303, 49)
Exception -13 at (300, 50)
Exception -13 at (301, 50)

-13 is OPTIX_EXCEPTION_CODE_CALLABLE_NO_DC_SBT_RECORD Tried to call a direct
callable using an SBT offset of a record that was built from a program group that did not
include a direct callable.
Why is it reporting this on certain launch indeces only?

void SampleRenderer::createCallablePrograms() {
		callablePGs.resize(MaterialTypeCount);
		{
			OptixProgramGroupOptions pgOptions{};
			OptixProgramGroupDesc pgDesc{};
			pgDesc.kind = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
			pgDesc.callables.moduleDC = module;
			pgDesc.callables.entryFunctionNameDC = "__direct_callable__scatterMetallic";

			char log[2048];
			size_t sizeofLog = sizeof(log);
			OPTIX_CHECK(optixProgramGroupCreate(
				optixContext,
				&pgDesc,
				1,
				&pgOptions,
				log,
				&sizeofLog,
				&callablePGs[MetallicMaterial]));
			if (sizeofLog > 1) PRINT(log);
		}
		{
			OptixProgramGroupOptions pgOptions{};
			OptixProgramGroupDesc pgDesc{};
			pgDesc.kind = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
			pgDesc.callables.moduleDC = module;
			pgDesc.callables.entryFunctionNameDC = "__direct_callable__scatterLambertian";

			char log[2048];
			size_t sizeofLog = sizeof(log);
			OPTIX_CHECK(optixProgramGroupCreate(
				optixContext,
				&pgDesc,
				1,
				&pgOptions,
				log,
				&sizeofLog,
				&callablePGs[LambertianMaterial]));
			if (sizeofLog > 1) PRINT(log);
		}
		{
			OptixProgramGroupOptions pgOptions{};
			OptixProgramGroupDesc pgDesc{};
			pgDesc.kind = OPTIX_PROGRAM_GROUP_KIND_CALLABLES;
			pgDesc.callables.moduleDC = module;
			pgDesc.callables.entryFunctionNameDC = "__direct_callable__scatterDielectric";

			char log[2048];
			size_t sizeofLog = sizeof(log);
			OPTIX_CHECK(optixProgramGroupCreate(
				optixContext,
				&pgDesc,
				1,
				&pgOptions,
				log,
				&sizeofLog,
				&callablePGs[DielectricMaterial]));
			if (sizeofLog > 1) PRINT(log);
		}
	}

pipeline creation:

void SampleRenderer::createPipeline() {
		std::vector<OptixProgramGroup> programGroups;
		for (auto pg : raygenPGs)
			programGroups.push_back(pg);
		for (auto pg : missPGs)
			programGroups.push_back(pg);
		for (auto pg : hitgroupPGs)
			programGroups.push_back(pg);
		for (auto pg : callablePGs)
			programGroups.push_back(pg);
		for (auto pg : exceptionPGs)
			programGroups.push_back(pg);

		char log[2048];
		size_t sizeof_log = sizeof(log);
		OPTIX_CHECK(optixPipelineCreate(
			optixContext,
			&pipelineCompileOptions,
			&pipelineLinkOptions,
			programGroups.data(),
			(int)programGroups.size(),
			log,
			&sizeof_log,
			&pipeline));
		if (sizeof_log > 1) PRINT(log);

and in buildSBT():

//Callables record
		std::vector<CallableRecord> callableRecords;
		{
			for (uint32_t i = 0u; i < callablePGs.size(); i++) {
				CallableRecord rec;
				OPTIX_CHECK(optixSbtRecordPackHeader(callablePGs[i], &rec));
				callableRecords.push_back(rec);
			}
			callableRecordsBuffer.alloc_and_upload(callableRecords);
			sbt.callablesRecordBase = missRecordsBuffer.d_pointer();
			sbt.callablesRecordStrideInBytes = static_cast<uint32_t>(sizeof(CallableRecord));
			sbt.callablesRecordCount = static_cast<uint32_t>(callableRecords.size());
		}

-13 is OPTIX_EXCEPTION_CODE_CALLABLE_NO_DC_SBT_RECORD Tried to call a direct
callable using an SBT offset of a record that was built from a program group that did not
include a direct callable.
Why is it reporting this on certain launch indeces only?

Impossible to say without seeing the whole project’s host and device code.
The exception says you called optixDirectCall with the wrong SBT index. End of story.

In your debug output, is the 75 sbtIndex the index of the exception program?
If not, that is your incorrect SBT index.

From your SBT setup, if your index variables MetallicMaterial, LambertianMaterial, DielectricMaterial are covering the three indices 0, 1, 2 then calling optixDirectCall() with 0, 1, and 2 should work.

The Hit shader reports the material type (among other things) back to the Raygen shader, via PerRayData. The Raygen shader, in turn, invokes material specific DirectCallables:

Note that using direct callables in this case is potentially slower than the switch-case.

1 Like

Thank you very much for your patience and suggestions; throughout this thread I learned quite a bit about OptiX.
It was exactly what exception handler/shader was pointing out: A cut-n-paste based error was producing those -13s.