Transformation Matrices from OpenGL reusable in OptiX pass

Hello,

I’ve been using OptiX to generate ray-traced shadows for an OpenGL scene, which sort of works

What doesn’t work so far is, when I use the rotation slider, which, of course, results in this behaviour:

So I was wondering if it is somehow possible to rotate the specific geometry branch in the acceleration structure or do I have to rebuild the whole structure with the transformed geometry every time that rotation value changes?

I also checked, whether the optixMotionGeometry example might offer what I’m looking for, but I couldn’t verify it, as the binary fails with a misaligned address errror

Thank you

Hi @Gummel!

I don’t understand what’s going on in the 2nd image. What is happening, and why? Are some of your shadows turning into geometry that doesn’t get rotated? Or are the shadows in a 2nd screen buffer that is composited? Or something else? What parts of the BVH are out of sync, and how are you expecting a BVH update to resolve the situation?

I’m not sure, but it sounds like the question is whether there is a faster way to rebuild an acceleration structure when you only want to move geometry around. If so, the answer is yes. You can insert your geometry into the BVH as an instance with a matrix transform for translating & rotating your geometry, and then you can perform a dynamic update on your acceleration structure when you change only the instance transform and not any of the rest of the geometry. This is much faster than rebuilding your acceleration structure. When using instances, you will have two acceleration structures: a top level Instance Acceleration Structure (IAS) that contains instances and their transforms, and a bottom level Geometry Acceleration Structure (GAS) which contains the geometry. When updating instance transforms, you will only need to update the IAS, which in your example here will be very small. Does that help? Hopefully I haven’t totally misunderstood your question.

For the bug report on optixMotionGeometry, which OS, driver, GPU, and OptiX versions are you using?


David.

Hello David,

I’ve got a slide bar in the GUI for the object rotation which simply creates a rotation matrix that transforms the OpenGL geometry (red dragon) for each frame. There’s also a model selector in the GUI, which replaces the geometry with the new model (and also updates the acceleration structure of the OptiXRenderer). So OptiX is basically generating a screen space texture containing the shadow factors (by the way using the inverted projection and view matrix with large far clipping plane values for now, if you remember, as I had to make progress with the renderer for an upcoming paper) and in OpenGL, the shader is using this texture to shade the correct images.

However, since I have no clue (other than regenerating the acceleration structure with the geometry data that was already rotated by the amount given by the GUI on the CPU, which I do without the transformation every time I change the model (cube, dragon, tree) geometry) where to fill in the transformation in my modified tutorial code (the tutorial doesn’t cover transformations not at all, unfortunately) the images generated by OpenGL and OptiX diverge when the model rotation is changed. So far, to init the renderer, I generate the initial acceleration structure

OptixTraversableHandle cgbv::optix::OptixRenderer::build_accelleration_structure()
{
	std::cout << "Building Acceleration Structure...";

	for (auto& buffer : vertex_buffer)
		buffer.free();

	for (auto& buffer : normal_buffer)
		buffer.free();

	for (auto& buffer : index_buffer)
		buffer.free();

	vertex_buffer.resize(meshes.size());
	normal_buffer.resize(meshes.size());
	index_buffer.resize(meshes.size());

	OptixTraversableHandle accel_structure_handle = 0ull;

	// Triangle Inputs
	// -----------------------------------------------------------
	std::vector<OptixBuildInput> triangle_input(meshes.size());
	std::vector<CUdeviceptr> device_vertices(meshes.size());
	std::vector<CUdeviceptr> device_indices(meshes.size());
	std::vector<uint32_t> triangle_input_flags(meshes.size());

	for (int mesh_id = 0; mesh_id < static_cast<int>(meshes.size()); ++mesh_id)
	{
		auto& model = meshes[mesh_id];
		vertex_buffer[mesh_id].alloc_and_upload(model.vertex);
		index_buffer[mesh_id].alloc_and_upload(model.index);
		
		if(!model.normal.empty())
			normal_buffer[mesh_id].alloc_and_upload(model.normal);

		triangle_input[mesh_id] = {};
		triangle_input[mesh_id].type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;

		device_vertices[mesh_id] = vertex_buffer[mesh_id].get_device_pointer();
		device_indices[mesh_id] = index_buffer[mesh_id].get_device_pointer();

		triangle_input[mesh_id].triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
		triangle_input[mesh_id].triangleArray.vertexStrideInBytes = sizeof(glm::vec3);
		triangle_input[mesh_id].triangleArray.numVertices = static_cast<int>(model.vertex.size());
		triangle_input[mesh_id].triangleArray.vertexBuffers = &device_vertices[mesh_id];
					  
		triangle_input[mesh_id].triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
		triangle_input[mesh_id].triangleArray.indexStrideInBytes = sizeof(glm::ivec3);
		triangle_input[mesh_id].triangleArray.numIndexTriplets = static_cast<int>(model.index.size());
		triangle_input[mesh_id].triangleArray.indexBuffer = device_indices[mesh_id];

		triangle_input_flags[mesh_id] = 0;

		triangle_input[mesh_id].triangleArray.flags = &triangle_input_flags[mesh_id];
		triangle_input[mesh_id].triangleArray.numSbtRecords = 1;
		triangle_input[mesh_id].triangleArray.sbtIndexOffsetBuffer = 0;
		triangle_input[mesh_id].triangleArray.sbtIndexOffsetSizeInBytes = 0;
		triangle_input[mesh_id].triangleArray.sbtIndexOffsetStrideInBytes = 0;
	}
	// -----------------------------------------------------------

	// BLAS setup
	// -----------------------------------------------------------
	OptixAccelBuildOptions accel_options = {};
	accel_options.buildFlags = OPTIX_BUILD_FLAG_NONE | OPTIX_BUILD_FLAG_ALLOW_COMPACTION;

	accel_options.motionOptions.numKeys = 1;
	accel_options.operation = OPTIX_BUILD_OPERATION_BUILD;

	OptixAccelBufferSizes blas_buffer_sizes;
	optix::error::check(optixAccelComputeMemoryUsage(optix.context, &accel_options, triangle_input.data(), static_cast<int>(meshes.size()), &blas_buffer_sizes));
	// -----------------------------------------------------------

	// Prepare Compaction
	// -----------------------------------------------------------
	CUDABuffer compacted_size_buffer;
	compacted_size_buffer.alloc(sizeof(uint64_t));

	OptixAccelEmitDesc emit_descriptor;
	emit_descriptor.type = OPTIX_PROPERTY_TYPE_COMPACTED_SIZE;
	emit_descriptor.result = compacted_size_buffer.get_device_pointer();
	// -----------------------------------------------------------

	// execuite build (main stage)
	// -----------------------------------------------------------
	CUDABuffer temp_buffer;
	temp_buffer.alloc(blas_buffer_sizes.tempSizeInBytes);

	CUDABuffer output_buffer;
	output_buffer.alloc(blas_buffer_sizes.outputSizeInBytes);

	optix::error::check(optixAccelBuild(optix.context, 0, &accel_options, triangle_input.data(), static_cast<int>(meshes.size()), temp_buffer.get_device_pointer(), temp_buffer.get_size_in_bytes(), output_buffer.get_device_pointer(), output_buffer.get_size_in_bytes(), &accel_structure_handle, &emit_descriptor, 1));
	cuda::error::cuda_sync_check();
	// -----------------------------------------------------------

	// perform compaction
	// -----------------------------------------------------------
	uint64_t compacted_size;
	compacted_size_buffer.download(&compacted_size, 1);

	//acceleration_structure_buffer.alloc(compacted_size);
	acceleration_structure_buffer.resize(compacted_size);
	optix::error::check(optixAccelCompact(optix.context, 0, accel_structure_handle, acceleration_structure_buffer.get_device_pointer(), acceleration_structure_buffer.get_size_in_bytes(), &accel_structure_handle));
	cuda::error::cuda_sync_check();
	// -----------------------------------------------------------

	// Clean Up
	// -----------------------------------------------------------
	output_buffer.free();
	temp_buffer.free();
	compacted_size_buffer.free();
	// -----------------------------------------------------------

	meshes_touched = false;

	std::cout << "done" << std::endl;

	return accel_structure_handle;
}

and then I generate the Shader Binary Table

void cgbv::optix::OptixRenderer::build_shader_binary_table()
{
	std::cout << "Building Shader Binary Table...";

	// Raygen Records
	// ----------------------------------------------------------------------------------
	std::vector<optix::RaygenRecord> raygen_records;
	for (int i = 0; i < optix.raygen_pg.size(); ++i)
	{
		optix::RaygenRecord record;
		optix::error::check(optixSbtRecordPackHeader(optix.raygen_pg[i], &record));
		record.data = nullptr;
		raygen_records.push_back(record);
	}
	optix.raygen_records_buffer.alloc_and_upload(raygen_records);
	optix.shader_binding_table.raygenRecord = optix.raygen_records_buffer.get_device_pointer();
	// ----------------------------------------------------------------------------------

	// Miss Records
	// ----------------------------------------------------------------------------------
	std::vector<optix::MissRecord> miss_records;
	for (int i = 0; i < optix.miss_pg.size(); ++i)
	{
		optix::MissRecord record;
		optix::error::check(optixSbtRecordPackHeader(optix.miss_pg[i], &record));
		record.data = nullptr;
		miss_records.push_back(record);
	}
	optix.miss_records_buffer.alloc_and_upload(miss_records);
	optix.shader_binding_table.missRecordBase = optix.miss_records_buffer.get_device_pointer();
	optix.shader_binding_table.missRecordStrideInBytes = sizeof(optix::MissRecord);
	optix.shader_binding_table.missRecordCount = static_cast<int>(miss_records.size());
	// ----------------------------------------------------------------------------------
	
	// Hitgroup Records 
	// ----------------------------------------------------------------------------------
	update_hitgroup_pg_for_sbt();
	// ----------------------------------------------------------------------------------

	std::cout << "done" << std::endl;
}

and to update the hitgroup programme groups I call

void cgbv::optix::OptixRenderer::update_hitgroup_pg_for_sbt()
{
	if (optix.hitgroup_records_buffer.get_device_pointer())
		optix.hitgroup_records_buffer.free();

	int num_objects = static_cast<int>(meshes.size());
	std::vector<optix::HitgroupRecord> hitgroup_records;
	for (int mesh_id = 0; mesh_id < num_objects; ++mesh_id)
	{
		for (int ray_id = 0; ray_id < static_cast<int>(optix::RayType::Count); ++ray_id)
		{
			optix::HitgroupRecord record;

			// all meshes use the same code, so all same hit group
			optix::error::check(optixSbtRecordPackHeader(optix.hitgroup_pg[ray_id], &record));

			record.data.vertex = reinterpret_cast<glm::vec3*>(vertex_buffer[mesh_id].get_device_pointer());
			record.data.normal = reinterpret_cast<glm::vec3*>(normal_buffer[mesh_id].get_device_pointer());
			record.data.index = reinterpret_cast<glm::ivec3*>(index_buffer[mesh_id].get_device_pointer());
			record.data.colour = meshes[mesh_id].colour;

			hitgroup_records.push_back(record);
		}
	}
	optix.hitgroup_records_buffer.alloc_and_upload(hitgroup_records);
	optix.shader_binding_table.hitgroupRecordBase = optix.hitgroup_records_buffer.get_device_pointer();
	optix.shader_binding_table.hitgroupRecordStrideInBytes = sizeof(optix::HitgroupRecord);
	optix.shader_binding_table.hitgroupRecordCount = static_cast<int>(hitgroup_records.size());
}

So, when I change the model geometry (to cube, tree or back to dragon), I basically call

void cgbv::optix::OptixRenderer::update_geometry_structure()
{
	launch_params.traversable = build_accelleration_structure();
	update_hitgroup_pg_for_sbt();
}

and create everything new from scratch (there was a warning that the acceleration structure may degenerate quickly when frequently changed). So, from the first glimpse at the links you provided, I probably have to change quite a bit there, eh?

Regarding optixMotionGeometry, I’m using Win11, Driver is 528.02, GPU is a 3090 and the OptiX Version is 7.6.0

Thank you very much for your help!

Markus

there was a warning that the acceleration structure may degenerate quickly when frequently changed

This warning is referring to changing the vertex locations of geometry when updating a GAS, or when moving many instances in a top-level IAS. This isn’t a concern if you only have 1 or 2 instances in your IAS, and you only update their instance matrix transforms.

Do keep in mind that in your example with only 1 or 2 instances, your IAS build is going to be trivial and extremely fast. It probably won’t matter whether you update it, or rebuild it from scratch. Setting your scene up so that you have an IAS and you only need to rebuild your IAS (and not your GAS) when the dragon rotates is going to make the updates very fast. Even doing a full rebuild on the IAS will be much faster than even using the UPDATE operation on the dragon GAS. Once you have an IAS, you will be able to completely rebuild the IAS every frame, if you want, and have no problem keeping it fast enough for real-time frame rates.

So, from the first glimpse at the links you provided, I probably have to change quite a bit there, eh?

I don’t think the change is very big, actually. All you need is to build an IAS (Instance Accel), which will point to your GAS (geometry accel, aka BLAS like in your comments). Then you pass the IAS handle to raygen instead of the GAS handle. OptiX takes care of traversing through the IAS and GAS together when you call optixTrace().

There isn’t much if anything that needs to change with your Program Groups or Shader Binding Table, though if you end up with more than 1 GAS then you might need to tweak your SBT offsets.

Have a look at the optixHair SDK sample, it makes use of an IAS. At the top of optixHair.cpp you’ll find first a function to build the GAS (makeHairGAS()) followed by another function to build the IAS (makeInstanceAccelerationStructure()). Take a look at how they’re called, and how the returned results get used, and I think it will make sense.


David.

Another example which is only rebuilding the root IAS (and an SRT motion matrix) can be found inside my intro_motion_blur example.

This code is doing the IAS update when any animation parameter is changed inside the GUI:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_motion_blur/src/Application.cpp#L2475

This is the initial IAS build which allows updates:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_motion_blur/src/Application.cpp#L1812

Note that I keep track of the IAS data including the temporary device allocation to avoid allocating these every time:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_motion_blur/inc/Application.h#L463

Thank you both for answering :-)

I’ll have a look and hopefully can get it to work quickly.

Hello again,

I kept studying the acceleration structures a bit and ended up with a couple of questions:

  • so, if I have a geometry-AS handle, I can attach this to an instance-AS to transform the entire geometry-AS, right? So, if my geometry-AS contains a ground surface model (like the plane in the images) and a center model (like the dragon), probably both models will get rotated if I specify a rotation matrix, right? If I only want the dragon to be rotated, I probably need to have two geometry-AS, one structure gA containing the dragon and one structure gB with the ground surface and an instance-AS iA to which gA is attached, right? But this requires somehow a mechanism to combine gB and iA (with gA attached), right? Is there a way to do this or did I once again end up with a wrong mental model?

  • Considering the 3x4 matrix that an instance-AS expects, why is OptiX using 3x4 matrices instead of the common 4x4 matrices used in computer graphics and how are they related (if at all)? How does OptiX perform the vertex transformation with that 3x4 matrix?

Thank you
Markus

Hi Markus,

Your first three sentences are fully correct. It sounds like the missing link might be that an IAS is normally built with multiple instances - there is a straightforward mechanism for putting links to both gA and gB into your single IAS named iA. Each instance gets it’s own transform matrix, so you control their placement and orientation separately.

Both 3x4 and 4x4 matrices are very common in computer graphics & games. The 3x4 is sometimes used, for example, when you want a ‘rigid body transform’, which includes translation, rotation, and scale, but excludes the rarely-ever-used components of a 4x4 that add shearing to the transform. Saving the 12 bytes of memory per transform is very useful for cases where people need to have millions of transforms, and saving one row of dot product is nice for reducing computation while rays are bouncing around your scene. The 3x4 is applied the same way a 4x4 would be, the 3x4 just has some implicit values taken from the 4x4 identity matrix.


David.

Example code again.

This is manually building a number of GAS and puts each under an OptixInstance with a different transform
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L1598
which are then built into into an IAS.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L1705

The more advanced examples in that repository traverse an arbitrarily deep host side scene graph and flatten it to a render graph with a two-level acceleration structure (IAS->GAS) which is fully hardware accelerated on RTX GPUs.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo9/src/Raytracer.cpp#L523
Because these examples supports multi-GPU, the IAS gets build per device.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo9/src/Device.cpp#L1519

EDIT: Ok, nevermind, I just found out what was wrong. Had to increase the the maxTraversableGraphDepth parameter when calling optixPipelineSetStackSize to two and add the OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING flag to the traversableGraphFlags of the pipeline compile options.

Hello,

so, I’ve created (or at least, tried to) an instance acceleration structure with this function:

OptixTraversableHandle cgbv::optix::OptixRenderer::build_instance_acceleration_structure()
{
	OptixTraversableHandle accel_structure_handle = 0ull;

	std::vector<OptixInstance> instances(1);

	auto x = sizeof(glm::mat3x4);

	auto y = glm::value_ptr(glm::mat3x4());

	for (int i = 0; i < instances.size(); ++i)
	{
		auto identity = glm::mat3x4(1.f);
		std::copy(glm::value_ptr(identity), glm::value_ptr(identity) + sizeof(glm::mat3x4) / sizeof(float), instances[i].transform);
		instances[i].instanceId = i;
		instances[i].visibilityMask = 255;
		instances[i].sbtOffset = 0;
		instances[i].flags = OPTIX_INSTANCE_FLAG_NONE;
		instances[i].traversableHandle = geometry_acceleration_structure;
	}

	CUDABuffer instance_buffer;
	instance_buffer.resize_and_upload(instances);


	// instance acceleration structure setup
	// -----------------------------------------------------------
	std::vector<OptixBuildInput> instanceInput(1);

	instanceInput[0].type = OPTIX_BUILD_INPUT_TYPE_INSTANCES;
	instanceInput[0].instanceArray.instances = instance_buffer.get_device_pointer();
	instanceInput[0].instanceArray.numInstances = static_cast<int>(instances.size());

	OptixAccelBuildOptions accel_build_options = {};

	accel_build_options.buildFlags = OPTIX_BUILD_FLAG_NONE | OPTIX_BUILD_FLAG_ALLOW_COMPACTION;
	accel_build_options.operation = OPTIX_BUILD_OPERATION_BUILD;

	OptixAccelBufferSizes ias_buffer_sizes = {};

	optix::error::check(optixAccelComputeMemoryUsage(optix.context, &accel_build_options, instanceInput.data(), 1, &ias_buffer_sizes));
	// -----------------------------------------------------------


	// Prepare Compaction
	// -----------------------------------------------------------
	CUDABuffer compacted_size_buffer;
	compacted_size_buffer.alloc(sizeof(uint64_t));

	OptixAccelEmitDesc emit_descriptor;
	emit_descriptor.type = OPTIX_PROPERTY_TYPE_COMPACTED_SIZE;
	emit_descriptor.result = compacted_size_buffer.get_device_pointer();
	// -----------------------------------------------------------


	// execute build (main stage)
	// -----------------------------------------------------------
	CUDABuffer temp_buffer;
	temp_buffer.alloc(ias_buffer_sizes.tempSizeInBytes);

	CUDABuffer output_buffer;
	output_buffer.resize(ias_buffer_sizes.outputSizeInBytes);

	optix::error::check(optixAccelBuild(optix.context, 0, &accel_build_options, instanceInput.data(), static_cast<int>(instanceInput.size()), temp_buffer.get_device_pointer(), temp_buffer.get_size_in_bytes(), output_buffer.get_device_pointer(), output_buffer.get_size_in_bytes(), &accel_structure_handle, &emit_descriptor, 1));
	cuda::error::cuda_sync_check();
	// -----------------------------------------------------------

	// perform compaction
	// -----------------------------------------------------------
	uint64_t compacted_size;
	compacted_size_buffer.download(&compacted_size, 1);

	instance_acceleration_structure_buffer.resize(compacted_size);
	optix::error::check(optixAccelCompact(optix.context, 0, accel_structure_handle, instance_acceleration_structure_buffer.get_device_pointer(), instance_acceleration_structure_buffer.get_size_in_bytes(), &accel_structure_handle));
	cuda::error::cuda_sync_check();
	// -----------------------------------------------------------

	// Clean Up
	// -----------------------------------------------------------
	temp_buffer.free();
	// -----------------------------------------------------------


	return accel_structure_handle;
}

geometry_acceleration_structure is what was returned in the build_acceleration_structure function further up in this thread. I’ve also set the traversable handle to the new instance acceleration structure in the optixLaunchParams, but either the optix-check or the optix-sync-check after calling optixLaunch(...) reports an invalid traversable:

[02][ERROR       ]: Validation mode caught builtin exception OPTIX_EXCEPTION_CODE_TRAVERSAL_INVALID_TRAVERSABLE
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)

I also tried to create the acceleration structure without compaction and tried to use the call to optixAccelBuild as shown in OptiX_Apps/Application.cpp at master · NVIDIA/OptiX_Apps · GitHub but this function sets a nullptr as the emittedProperties argument, which I tried, but that resulted in an error that optixAccelBuild wants to set emitted properties, which it can’t due to the nullptr.

So, I was wondering if a compacted instance acceleration structure is valid at all.

The call to ´optixTrace` looks something like this, but I haven’t changed anything in the optix shaders:

optixTrace(optixLaunchParams.traversable, cu_camera_position, cu_ray_dir, t_min, t_max, 0.0f /* rayTime */, OptixVisibilityMask(255), OPTIX_RAY_FLAG_DISABLE_ANYHIT /*OPTIX_RAY_FLAG_NONE*/, static_cast<unsigned int>(optix::RayType::Radiance) /*SBT offset*/, static_cast<unsigned int>(optix::RayType::Count) /*SBT stride*/, static_cast<unsigned int>(optix::RayType::Radiance) /*missSBTIndex*/, u0, u1);

Do you have an idea or a hint, why the instance acceleration structure might be invalid?

Thank you very much!