[Optix 7] How to debug sbt

Hello!
I am still working on my OptiX 7 application on VS19 using the 441.87 driver.

Starting from the meshviewer example I modified the sbt, and think I might have an error there or during my optixLaunch.
The .cu and .cpp compile without errors.
How would I get into debugging there?

I only noticed that I got an error, since a CUDA_CHECK(cudaMemcpy)) called directly after the optixTrace gave me an (memory?) exception (not in the terminal, just a window in VS)
It seems, that the OPTIX_CHECK(optixLaunch()) does not catch the error that must occur during the optixLaunch().

Is this expected behavior due to the asynchronous nature of OptiX?
I am now using a CUDA_SYNC_CHECK() after the optixTrace and now I get the error message there.
What does CUDA_SYNC_CHECK() do in this context and am using it correctly?

Thank you for your help!

Hi, you can see the source for CUDA_SYNC_CHECK() in sutil/Exception.h

It calls cudaDeviceSynchronize() and then if there’s an error, prints the result of cudaGetLastError().

What this means is that something bad happened during launch. You could try to catch it using Nsight VSE. Failing that, it sounds like you suspect your SBT, so try to find out whether it’s an SBT problem or a problem with one of the programs accessing memory out of bounds or using a bad device pointer.

One way to debug your SBT is by disabling all but your raygen program, and then re-enabling them one at a time until you find which one is the problem. You can also print the SBT on the host and manually inspect and verify all the data in there.

Review the SBT indexing formula to make sure it makes sense and your SBT is laid out correctly; https://raytracing-docs.nvidia.com/optix7/guide/index.html#shader_binding_table#shader-binding-tables-for-geometric-acceleration-structures


David.

Hey David!

Thank you for your answer. I have never used Nsight VSE with OptiX, there isn’t a tutorial to get started with OptiX debugging? Do you have a recommendation on how a total newbie can get into Nsight? Since I only get the error message " 0x000002a5624ca2a0 "CUDA error on synchronize with error ‘an illegal memory access was encountered’ " I guess Nsight would be the way to figure out were it exactly happens.

I guess you can’t see from the error code if this is because I messed up in pipeline/sbt creation or if it’s related to my device programs?
I went several times carefully through the sbt und pipeline creation and couldn’t find any obvious errors.

I still don’t fully understand the setup of the sbt, but I think I modified the example correctly.

void Scene::createPTXModule()
{

	OptixModuleCompileOptions module_compile_options = {};
	module_compile_options.optLevel = OPTIX_COMPILE_OPTIMIZATION_DEFAULT;
	module_compile_options.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_LINEINFO;

	m_pipeline_compile_options = {};
	m_pipeline_compile_options.usesMotionBlur = false;
	m_pipeline_compile_options.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING;
	m_pipeline_compile_options.numPayloadValues = whitted::NUM_PAYLOAD_VALUES;
	m_pipeline_compile_options.numAttributeValues = 2; // TODO
	m_pipeline_compile_options.exceptionFlags = OPTIX_EXCEPTION_FLAG_NONE; // should be OPTIX_EXCEPTION_FLAG_STACK_OVERFLOW;
	m_pipeline_compile_options.pipelineLaunchParamsVariableName = "params";

	const std::string ptx = getPtxString(nullptr, "whitted.cu");

	m_ptx_module = {};
	char log[2048];
	size_t sizeof_log = sizeof(log);
	OPTIX_CHECK_LOG(optixModuleCreateFromPTX(
		m_context,
		&module_compile_options,
		&m_pipeline_compile_options,
		ptx.c_str(),
		ptx.size(),
		log,
		&sizeof_log,
		&m_ptx_module
	));
}

void Scene::createProgramGroups()
{
	OptixProgramGroupOptions program_group_options = {};

	char log[2048];
	size_t sizeof_log = sizeof(log);

	//
	// Ray generation
	//
	{

		OptixProgramGroupDesc raygen_prog_group_desc = {};
		raygen_prog_group_desc.kind = OPTIX_PROGRAM_GROUP_KIND_RAYGEN;
		raygen_prog_group_desc.raygen.module = m_ptx_module;
		raygen_prog_group_desc.raygen.entryFunctionName = "__raygen__therm";

		OPTIX_CHECK_LOG(optixProgramGroupCreate(
			m_context,
			&raygen_prog_group_desc,
			1,                             // num program groups
			&program_group_options,
			log,
			&sizeof_log,
			&m_raygen_prog_group
		)
		);
	}

	//
	// Miss
	//
	{
		OptixProgramGroupDesc miss_prog_group_desc = {};
		miss_prog_group_desc.kind = OPTIX_PROGRAM_GROUP_KIND_MISS;
		miss_prog_group_desc.miss.module = m_ptx_module;
		miss_prog_group_desc.miss.entryFunctionName = "__miss__therm";
		sizeof_log = sizeof(log);
		OPTIX_CHECK_LOG(optixProgramGroupCreate(
			m_context,
			&miss_prog_group_desc,
			1,                             // num program groups
			&program_group_options,
			log,
			&sizeof_log,
			&m_therm_miss_group
		)
		);
	}

	//
	// Hit group
	//
	{
		OptixProgramGroupDesc hit_prog_group_desc = {};
		hit_prog_group_desc.kind = OPTIX_PROGRAM_GROUP_KIND_HITGROUP;
		hit_prog_group_desc.hitgroup.moduleCH = m_ptx_module;
		hit_prog_group_desc.hitgroup.entryFunctionNameCH = "__closesthit__therm";
		sizeof_log = sizeof(log);
		OPTIX_CHECK_LOG(optixProgramGroupCreate(
			m_context,
			&hit_prog_group_desc,
			1,                             // num program groups
			&program_group_options,
			log,
			&sizeof_log,
			&m_therm_hit_group
		)
		);
	}
}

void Scene::createPipeline()
{
    OptixProgramGroup program_groups[] =
    {
        m_raygen_prog_group,
        m_therm_miss_group,
        m_therm_hit_group
    };

    OptixPipelineLinkOptions pipeline_link_options = {};
    pipeline_link_options.maxTraceDepth          = 3;
    pipeline_link_options.debugLevel             = OPTIX_COMPILE_DEBUG_LEVEL_FULL;
    pipeline_link_options.overrideUsesMotionBlur = false;

    char log[2048];
    size_t sizeof_log = sizeof( log );
    OPTIX_CHECK_LOG( optixPipelineCreate(
                m_context,
                &m_pipeline_compile_options,
                &pipeline_link_options,
                program_groups,
                sizeof( program_groups ) / sizeof( program_groups[0] ),
                log,
                &sizeof_log,
                &m_pipeline
                ) );
}


void Scene::createSBT()
{
    {
        const size_t raygen_record_size = sizeof( EmptyRecord );
        CUDA_CHECK( cudaMalloc( reinterpret_cast<void**>( &m_sbt.raygenRecord ), raygen_record_size ) );

        EmptyRecord rg_sbt;
        OPTIX_CHECK( optixSbtRecordPackHeader( m_raygen_prog_group, &rg_sbt ) );
        CUDA_CHECK( cudaMemcpy(
                    reinterpret_cast<void*>( m_sbt.raygenRecord ),
                    &rg_sbt,
                    raygen_record_size,
                    cudaMemcpyHostToDevice
                    ) );
    }

    {
        const size_t miss_record_size = sizeof( EmptyRecord );
        CUDA_CHECK( cudaMalloc(
                    reinterpret_cast<void**>( &m_sbt.missRecordBase ),
                    miss_record_size*whitted::RAY_TYPE_COUNT
                    ) );

        EmptyRecord ms_sbt[ whitted::RAY_TYPE_COUNT ];
        OPTIX_CHECK( optixSbtRecordPackHeader( m_therm_miss_group,  &ms_sbt[0] ) );

        CUDA_CHECK( cudaMemcpy(
                    reinterpret_cast<void*>( m_sbt.missRecordBase ),
                    ms_sbt,
                    miss_record_size*whitted::RAY_TYPE_COUNT,
                    cudaMemcpyHostToDevice
                    ) );
        m_sbt.missRecordStrideInBytes = static_cast<uint32_t>( miss_record_size );
        m_sbt.missRecordCount     = whitted::RAY_TYPE_COUNT;
    }

    {
        std::vector<HitGroupRecord> hitgroup_records;
        for( const auto mesh : m_meshes )
        {
            for( size_t i = 0; i < mesh->material_idx.size(); ++i )
            {
                HitGroupRecord rec = {};
                OPTIX_CHECK( optixSbtRecordPackHeader( m_therm_hit_group, &rec ) );
                rec.data.geometry_data.type                    = GeometryData::TRIANGLE_MESH;
                rec.data.geometry_data.triangle_mesh.positions = mesh->positions[i];
                rec.data.geometry_data.triangle_mesh.normals   = mesh->normals[i];
                rec.data.geometry_data.triangle_mesh.texcoords = mesh->texcoords[i];
                rec.data.geometry_data.triangle_mesh.indices   = mesh->indices[i];

                const int32_t mat_idx  = mesh->material_idx[i];
                if( mat_idx >= 0 )
                    rec.data.material_data.pbr = m_materials[ mat_idx ];
                else
                    rec.data.material_data.pbr = MaterialData::Pbr();
                hitgroup_records.push_back( rec );

            }
        }

        const size_t hitgroup_record_size = sizeof( HitGroupRecord );
        CUDA_CHECK( cudaMalloc(
                    reinterpret_cast<void**>( &m_sbt.hitgroupRecordBase ),
                    hitgroup_record_size*hitgroup_records.size()
                    ) );
        CUDA_CHECK( cudaMemcpy(
                    reinterpret_cast<void*>( m_sbt.hitgroupRecordBase ),
                    hitgroup_records.data(),
                    hitgroup_record_size*hitgroup_records.size(),
                    cudaMemcpyHostToDevice
                    ) );

        m_sbt.hitgroupRecordStrideInBytes = static_cast<unsigned int>( hitgroup_record_size );
        m_sbt.hitgroupRecordCount         = static_cast<unsigned int>( hitgroup_records.size() );
    }
}

Thanks again for your help and patience.
I hope you can point me in the right direction.

I would not use different debugLevel at OptixModuleCompileOptions and OptixPipelineLinkOptions.
(When changing OptixModuleCompileOptions::debugLevel make sure to match optLevel as well.
You cannot have full optimization and full debug at the same time during compilation.)

Are you calculating the stack size correctly?

What’s your OS version and GPU system setup?

Okay, thanks!
I will change that tomorrow.

If this could cause problems, you might want to change it in the Scene.cpp shipped with the Optix 7 SDK.

Thank you for your answer Detlef!

I run VS19 on Win10 with a RTX2080TI.

Where is the stack size calculated?
If I understood the documentation correctly, the calculation is handled internally in OptiX 7.
Since I want to model reflections, I have second optixTrace() outside of the raygen.
Does the in the documentation mentioned trace depth mean, the amount of “reflections” or is it referring to the acceleration structure?

Since I am not getting a stack overflow error message, isn’t it unlikely that it’s a problem related to stack size? (but I’m just an aerospace engineer with no background in coding)

Tomorrow I will comment out the reflection part in my ch() and see if it changes something.

The code excerpts you posted looked fine from a first glance, so I was thinking about other reasons where things could go wrong, including compiler errors due to the mismatching compile and link options or maybe incorrect stack size.

OptiX version 6 and earlier never calculated the required stack size internally.
In OptiX 7 there is an internal default calculation, but that is not always applicable.

Determining the stack size requires some calculations which depend on all programs in a pipeline, the maximum trace depth and the maximum scene hierarchy depth.
https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#pipeline-stack-size
Additionally to the listings in that chapter, the OptiX SDK 7.0.0 contains some utility functions inside optix_stack_size.h which combine that.

The optixMeshViewer example uses the Scene::createPipeline() function in sutil to setup the pipeline and that has a maxTraceDepth of 2 and it is not calling optixPipelineSetStackSize().

I would always recommend to calculate the stack size yourself, esp. if you changed the maxTraceDepth.

maxTraceDepth means the maximum number of recursive optixTrace() calls which can occur.
For example, if you shoot only primary rays inside the raygeneration program, then maxTraceDepth == 1.
If the closest hit program shoots a non-recursive shadow ray, then maxTraceDepth == 2. That would be the standard case for an iterative path tracer with direct lighting.
If you shoot a recursive ray in the closest hit program then maxTraceDepth is limited by your recursion end condition only. This would be the case in a recursive Whitted ray tracer.
Sounds like you did exactly that last case when adding another optixTrace(). Is the end condition matching your maxTraceDepth?

The other term is the maxTraversableGraphDepth argument inside the optixPipelineSetStackSize() function.
It’s the maximum number of instance and geometry acceleration structures which can be traversed during an optixTrace() call from root traversable handle to the bottom-most geometry AS.
This depends on the OptixTraversableGraphFlags setting inside the OptixPipelineCompileOptions and your scene setup.

Means if it’s set to OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS, then maxTraversableGraphDepth == 1,
if it’s OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING, then maxTraversableGraphDepth == 2,
if it’s OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_ANY, then it depends on your scene hierarchy.

Looks like the OptiX SDK examples neither enable exceptions nor do they install a custom exception program.
There should be a built-in exception program but I never used it:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#exceptions

You shouldn’t get any exceptions as long as you have exceptionFlags set to OPTIX_EXCEPTION_FLAG_NONE, you will get rendering errors or CUDA crashes instead.
Note that enabling exceptions will reduce the runtime performance. Only do that for debugging.

If you want to have another view on how to build programs with OptiX 7, have a look at these examples which handle many of the things I described above: https://github.com/NVIDIA/OptiX_Apps

Thank you for your thorough answer!

In the module_compile_options I set the OPTIX_COMPILE_OPTIMIZATION_LEVEL_0 and OPTIX_COMPILE_DEBUG_LEVEL_FULL flags. I still get the same error message in VS and no information in the terminal or elsewhere.
I also have all the other functions set to the highest debug information.
I tried using NSight to get to the point were things go south, but I couldn’t get it to show me anything meaningful. In the Might be that I am using it incorrectly. I just set a breakpoint in the .cu file and it should stop the thread there? Or am i mistaken? Is there documentation on how to use Nsight with OptiX, or another place I could start from?

When I disable my optixTrace(), or give it a OptixTraversablehandle handle = 0; it at least doesn’t give me an exception. If I feed it one of the GAS handles it crashes with the same error. Since I also disabled my reflections I have a trace depth of one and shouldn’t be messing with the stack-size.

I can’t figure out were my error lies. I have been trying to get to the bottom of this for two days now and am still completely clueless. Would you have any suggestions on how to approach this issue?

We don’t yet have any special OptiX documentation for Nsight tools, but the regular tools documentation should generally work, e.g., https://developer.nvidia.com/nsight-visual-studio-edition. For setting kernel breakpoints, the main thing to know about OptiX is that your kernel launch is named “megakernel_simple”. In Nsight tools, you can use a regex match with “megakernel” if you need to.

While it might be a last resort, printf() can be quite useful for catching errors and crashes, especially if you can limit the amount of spew. Sprinkle unique print markers before and after the region where you suspect a crash, and watch for which print is last, and which ones don’t show. Then you can narrow down and bisect the crash by adding another unique marker between the last one shown and the first one not shown. It may be useful to limit your printf() to a specific region or pixel by using optixGetLaunchIndex().


David.

Hey!

I found the source for the error.
During the loading of the scene from the .gltf I added an primitive index offset.
I modified the Scene::buildMeshAccels() function inside the Scene.cpp

int triangle_counter = 0;

for(size_t i=0; i<m_meshes.size(); ++i)
{
   auto& mesh = m_meshes[i];
   const size_t num_subMeshes =  mesh->indices.size();
   std::vector<OptixBuildInput> buildInputs(num_subMeshes);

   assert(mesh->positions.size() == num_subMeshes &&
   mesh->normals.size()   == num_subMeshes &&
   mesh->texcoords.size() == num_subMeshes);

   for(size_t i = 0; i < num_subMeshes; ++i)
   {
     OptixBuildInput& triangle_input                          = buildInputs[i];
     memset(&triangle_input, 0, sizeof(OptixBuildInput));
     triangle_input.type                                      = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
     triangle_input.triangleArray.vertexFormat                = OPTIX_VERTEX_FORMAT_FLOAT3;
     triangle_input.triangleArray.vertexStrideInBytes         =
       mesh->positions[i].byte_stride ?
       mesh->positions[i].byte_stride :
       sizeof(float3),
       triangle_input.triangleArray.numVertices               = mesh->positions[i].count;
     triangle_input.triangleArray.vertexBuffers               = &(mesh->positions[i].data);
     triangle_input.triangleArray.indexFormat                 =
       mesh->indices[i].elmt_byte_size == 2 ?
       OPTIX_INDICES_FORMAT_UNSIGNED_SHORT3 :
       OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
     triangle_input.triangleArray.indexStrideInBytes          =
       mesh->indices[i].byte_stride ?
       mesh->indices[i].byte_stride :
       mesh->indices[i].elmt_byte_size*3;
     triangle_input.triangleArray.numIndexTriplets            = mesh->indices[i].count / 3;
     triangle_input.triangleArray.indexBuffer                 = mesh->indices[i].data;
     triangle_input.triangleArray.flags                       = &triangle_input_flags;
     triangle_input.triangleArray.numSbtRecords               = 1;
 ->//triangle_input.triangleArray.primitiveIndexOffset        = triangle_counter; //works after commenting out

     triangle_counter += mesh->indices[i].count / 3;
    }

    OptixAccelBufferSizes gas_buffer_sizes;
    OPTIX_CHECK( optixAccelComputeMemoryUsage( m_context, &accel_options, buildInputs.data(),
                                                   static_cast<unsigned int>( num_subMeshes ), &gas_buffer_sizes ) );

    totalTempOutputSize += gas_buffer_sizes.outputSizeInBytes;
    GASInfo g = {std::move( buildInputs ), gas_buffer_sizes, mesh};
    gases.emplace( gas_buffer_sizes.outputSizeInBytes, g );
}

Am I missing here something, or why does this let the programm crash?
Right now my device functions are still empty, but with the modified primitiveIndexOffset commented out it compiles and runs without an error.

Hey!

I still haven’t found from were I get the illegal memory access.

Just for clarification, I don’t require an output buffer?
I am talking about the buffer set up in the optixMeshviewer example:

int main( int argc, char* argv[] )
{
    sutil::CUDAOutputBufferType output_buffer_type = sutil::CUDAOutputBufferType::GL_INTEROP;

...

I modified the launch routine in my program to the following:

whitted::LaunchParams* d_params = nullptr;
whitted::LaunchParams   params = {};

int32_t           samples_per_launch = 1;

void initLaunchParams(const sutil::Scene& scene) {

    CUDA_CHECK(cudaMalloc(reinterpret_cast<void**>(&d_params), sizeof(whitted::LaunchParams)));

    params.handle = scene.traversableHandle();
}

void cleanup()
{
    CUDA_CHECK(cudaFree(reinterpret_cast<void*>(d_params)));
}

int main(int argc, char* argv[])
{

    std::string infile = sutil::sampleDataFilePath("2-ICOS.gltf");

    try
    {
        sutil::Scene scene;
        sutil::loadScene(infile.c_str(), scene);
        scene.finalize();

        OPTIX_CHECK(optixInit()); // Need to initialize function table
        initLaunchParams(scene);

CUDA_CHECK(cudaMemcpyAsync(reinterpret_cast<void*>(d_params),
            &params,
            sizeof(whitted::LaunchParams),
            cudaMemcpyHostToDevice,
            0 // stream
        ));

        OPTIX_CHECK(optixLaunch(
            scene.pipeline(),
            0,
            reinterpret_cast<CUdeviceptr>(d_params),
            sizeof(whitted::LaunchParams),
            scene.sbt(),
            1,
            1,
            1));
        CUDA_SYNC_CHECK();

        cleanup();

    }
    catch (std::exception & e)
    {
        std::cerr << "Caught exception: " << e.what() << "\n";
        return 1;
    }

    return 0;
}

I always get an error at the CUDA_SYNC_CHECK() after the Launch, without any errors at compiling.

Since I don’t render anything, I figured I wouldn’t need it.

Thanks for your help!

If you do not touch any member in the launch parameter block inside your device programs, it can contain anything.

Make sure that glTF model is loading with the optixMeshViewer at all, because the glTF loader is known to not support all vertex attribute layouts correctly.
https://devtalk.nvidia.com/default/topic/1066338/optix/-misaligned-address-quot-exception-when-rendering-some-gltf2-models/post/5400380/#5400380

If that is working and your program isn’t, please provide a complete, minimal reproducer in failing state for analysis. Isolated code excerpts won’t help as reproducers.

Maybe check if its not related to the bug in the OptiX 7 SDK I’ve found

https://devtalk.nvidia.com/default/topic/1072808/optix/-bugreport-amp-fix-optix-7-corrupts-cudeviceptr-in-the-sbt-due-to-truncation-hardcore-/?offset=2#5434966