I used the raycasting example as a base for building a larger set of classes to trace a custom set of rays using OptiX SDK 7.6 on an Ubuntu 22.04 base development image with the latest CUDA 12.0 install and driver 525.105.17 on either an RTX 3050 or GTX 1050. They both work, the former faster of course.
I set it up to allow adding multiple meshes, each mesh being triangular. When I add one OptixBuildInput
with a single triangle mesh and run optixBuildAccel
with a pointer to just that single element, it traces everything very nicely.
However if I give it more than one, it still traces the whole geometry (with some strange artifacts), but it spews errors like these:
( 0, 144, 0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
( 1, 144, 0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
( 2, 144, 0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
( 3, 144, 0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
( 0, 145, 0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
( 0, 144, 0) transform list of size 0:
( 1, 144, 0) transform list of size 0:
( 2, 144, 0) transform list of size 0:
( 3, 144, 0) transform list of size 0:
( 0, 145, 0) transform list of size 0:
This of course varies from run to run because of asynchronicity, but this is a common error output.
While the traced output with one mesh perfectly matches my CPU-only tracer backend, the output with two meshes shows hits which are mostly correct, but sometimes appear displaced from the same thing I get from my CPU-only tracing backend. The tuples being output are the problematic launch indexes. I looked up the error in the source, which seems to be the only place this output is ever mentioned at all. It’s obviously related to the SBT. But both meshes use the same SBT. Shouldn’t this not be a problem?
The SBT records all exist and are empty. They’re properly aligned and also have the required header. I don’t reference the SBT records at all from the hit raygen, hit, and miss programs. I realize that there is some implicit data that becomes accessible when a closest hit is encountered, but I only read and write the globally accessible Params
structure.
I tried creating another SBT setup just to have it there so the errors can’t complain “index 1” isn’t there. This caused memory access error on the GPU and ended up blocking my unit tests from running at all on the GPU.
There’s no mention of these messages anywhere on the internet that I can find. I’ve tried everything I can think of at this point. I’m sure it’s something I’m missing about the SBT, but I don’t know what it could be. The programming guide seems to instruct me to do what I’m doing already for multiple OptixBuildInput
. Is there any general insight anyone has that might help understand this complaint about the SBT?
Here’s the acceleration structure build. Note that the OptixBuildInput
list is stored internally in a map, but at build time I put them into a vector
. Whether they’re referenced to the original or not doesn’t seem to matter. They both produce the same result.
void OptixTracer::buildAccelStructure()
{
// Stack all the build inputs into an array
_buildInputArray.clear();
for (auto [name, input] : _optixInputs)
_buildInputArray.push_back(input);
// Do full update because geometry was added
bool _fullUpdate = true;
// Set the options
_accelBuildOptions = {};
#ifdef OPTIX_ALWAYS_BUILD_FULL_GAS
_accelBuildOptions.buildFlags = OPTIX_BUILD_FLAG_NONE;
#else
_fullUpdate = geometryWasUpdated();
_accelBuildOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE;
#endif
_accelBuildOptions.operation = _fullUpdate ? OPTIX_BUILD_OPERATION_BUILD : OPTIX_BUILD_OPERATION_UPDATE;
// Calculate the GAS buffer sizes
if (_fullUpdate || !_gasBuffersAllocated)
{
OPTIX_CHECK(
optixAccelComputeMemoryUsage(
_devContext,
&_accelBuildOptions,
_buildInputArray.data(),
_buildInputArray.size(),
&_gasBufferSizes
)
);
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void**>(&_devGasTempBuffer),
_gasBufferSizes.tempSizeInBytes
)
);
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void**>(&_devGasOutputBuffer),
_gasBufferSizes.outputSizeInBytes
)
);
// Indicate that we already built the accleeration structure and allocate
// the space needed to easily update
_gasBuffersAllocated = true;
}
OPTIX_CHECK(
optixAccelBuild(
_devContext,
_cuStream, // This is the CUDA stream
&_accelBuildOptions,
_buildInputArray.data(),
_buildInputArray.size(),
_devGasTempBuffer,
_gasBufferSizes.tempSizeInBytes,
_devGasOutputBuffer,
_gasBufferSizes.outputSizeInBytes,
&_gasHandle,
nullptr,
0
)
);
}
This is the source that builds the OptixBuildInput
.
int OptixTracer::addGeometry(const std::string& _meshName, enum RTCGeometryType _geometryType, int _numVertices, int _numElements)
{
// Just make buffers and build inputs
_optixInputs[_meshName] = {};
_optixInputs[_meshName].type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
_optixInputs[_meshName].triangleArray.flags = _buildInputFlags;
_optixInputs[_meshName].triangleArray.numSbtRecords = 1;
_optixInputs[_meshName].triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
_optixInputs[_meshName].triangleArray.vertexStrideInBytes = sizeof(float3);
_optixInputs[_meshName].triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
_optixInputs[_meshName].triangleArray.indexStrideInBytes = sizeof(uint3);
_optixInputs[_meshName].triangleArray.numVertices = static_cast<uint32_t>(_numVertices);
_optixInputs[_meshName].triangleArray.numIndexTriplets = static_cast<uint32_t>(_numElements);
// Allocate space
const size_t vertexSize = sizeof(float3);
const size_t elementSize = sizeof(int3);
// Allocate space in RAM calling in-place constructor
_vertices.emplace(
std::piecewise_construct,
std::forward_as_tuple(_meshName),
std::forward_as_tuple(_numVertices)
);
_elements.emplace(
std::piecewise_construct,
std::forward_as_tuple(_meshName),
std::forward_as_tuple(_numElements)
);
// Insert key into device vertices and elements for storing GPU space
_devVertices.emplace(
std::piecewise_construct,
std::forward_as_tuple(_meshName),
std::forward_as_tuple(0)
);
_devElements.emplace(
std::piecewise_construct,
std::forward_as_tuple(_meshName),
std::forward_as_tuple(0)
);
// Allocate the actual space on the GPU for vertices and elements
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void **>(&_devVertices[_meshName]),
static_cast<size_t>(_numVertices * vertexSize)
)
);
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void **>(&_devElements[_meshName]),
static_cast<size_t>(_numElements * elementSize)
)
);
// Set allocate device data to location on optix build input
_optixInputs[_meshName].triangleArray.vertexBuffers = &_devVertices[_meshName];
_optixInputs[_meshName].triangleArray.indexBuffer = _devElements[_meshName];
CUDA_SYNC_CHECK();
// Count the geometries to know how much space is needed
setGeometryCount(getGeometryCount() + 1);
// Add or remove geometry event
_geometryWasUpdated.store(true);
return 0;
}
For completeness I’ll include the SBT setup method.
void OptixTracer::setupSbtRecords()
{
// Raygen SBT record
const size_t raygenRecordSize = sizeof(RayGenSbtRecord);
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void **>(&_devRaygenSbtRecord),
raygenRecordSize
)
);
OPTIX_CHECK(
optixSbtRecordPackHeader(
_raygenProgramGroup,
&_raygenSbtRecord
)
);
CUDA_CHECK(
cudaMemcpy(
reinterpret_cast<void *>(_devRaygenSbtRecord),
&_raygenSbtRecord,
raygenRecordSize,
cudaMemcpyHostToDevice
)
);
// Miss SBT record
const size_t missRecordSize = sizeof(MissSbtRecord);
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void **>(&_devMissSbtRecord),
missRecordSize
)
);
OPTIX_CHECK(
optixSbtRecordPackHeader(
_missProgramGroup,
&_missSbtRecord
)
);
CUDA_CHECK(
cudaMemcpy(
reinterpret_cast<void *>(_devMissSbtRecord),
&_missSbtRecord,
missRecordSize,
cudaMemcpyHostToDevice
)
);
// Hit group SBT record
const size_t hitGroupRecordSize = sizeof(HitGroupSbtRecord);
CUDA_CHECK(
cudaMalloc(
reinterpret_cast<void **>(&_devHitgroupSbtRecord),
hitGroupRecordSize
)
);
OPTIX_CHECK(
optixSbtRecordPackHeader(
_hitgroupProgramGroup,
&_hitgroupSbtRecord
)
);
CUDA_CHECK(
cudaMemcpy(
reinterpret_cast<void *>(_devHitgroupSbtRecord),
&_hitgroupSbtRecord,
hitGroupRecordSize,
cudaMemcpyHostToDevice
)
);
// Fill out SBT structure
_shaderBindingTable.raygenRecord = _devRaygenSbtRecord;
_shaderBindingTable.missRecordBase = _devMissSbtRecord;
_shaderBindingTable.missRecordStrideInBytes = sizeof(MissSbtRecord);
_shaderBindingTable.missRecordCount = 1;
_shaderBindingTable.hitgroupRecordBase = _devHitgroupSbtRecord;
_shaderBindingTable.hitgroupRecordStrideInBytes = sizeof(HitGroupSbtRecord);
_shaderBindingTable.hitgroupRecordCount = 1;
}
Just note that addGeometry
adds the geometry and allocates the space in RAM and GPU memory, but does not put actual data points in the structure. Before the above acceleration build function is called another function places all the necessary points in the structure so the geometry is there for the build.
This all works very well with only one OptixBuildInput
, i.e. just one of the _optixInputs[meshString]
given as input.
Any insight would be greatly appreciated. I’m at a dead end.