Tracing Works As Expected With `optixBuildAccel` With Only One OptixBuildInput But Fails With Two

I used the raycasting example as a base for building a larger set of classes to trace a custom set of rays using OptiX SDK 7.6 on an Ubuntu 22.04 base development image with the latest CUDA 12.0 install and driver 525.105.17 on either an RTX 3050 or GTX 1050. They both work, the former faster of course.

I set it up to allow adding multiple meshes, each mesh being triangular. When I add one OptixBuildInput with a single triangle mesh and run optixBuildAccel with a pointer to just that single element, it traces everything very nicely.

However if I give it more than one, it still traces the whole geometry (with some strange artifacts), but it spews errors like these:

(   0, 144,   0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
(   1, 144,   0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
(   2, 144,   0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
(   3, 144,   0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
(   0, 145,   0) error: invalid hit sbt of 1 at primitive with gas sbt index 1
(   0, 144,   0) transform list of size 0:
(   1, 144,   0) transform list of size 0:
(   2, 144,   0) transform list of size 0:
(   3, 144,   0) transform list of size 0:
(   0, 145,   0) transform list of size 0:

This of course varies from run to run because of asynchronicity, but this is a common error output.

While the traced output with one mesh perfectly matches my CPU-only tracer backend, the output with two meshes shows hits which are mostly correct, but sometimes appear displaced from the same thing I get from my CPU-only tracing backend. The tuples being output are the problematic launch indexes. I looked up the error in the source, which seems to be the only place this output is ever mentioned at all. It’s obviously related to the SBT. But both meshes use the same SBT. Shouldn’t this not be a problem?

The SBT records all exist and are empty. They’re properly aligned and also have the required header. I don’t reference the SBT records at all from the hit raygen, hit, and miss programs. I realize that there is some implicit data that becomes accessible when a closest hit is encountered, but I only read and write the globally accessible Params structure.

I tried creating another SBT setup just to have it there so the errors can’t complain “index 1” isn’t there. This caused memory access error on the GPU and ended up blocking my unit tests from running at all on the GPU.

There’s no mention of these messages anywhere on the internet that I can find. I’ve tried everything I can think of at this point. I’m sure it’s something I’m missing about the SBT, but I don’t know what it could be. The programming guide seems to instruct me to do what I’m doing already for multiple OptixBuildInput. Is there any general insight anyone has that might help understand this complaint about the SBT?

Here’s the acceleration structure build. Note that the OptixBuildInput list is stored internally in a map, but at build time I put them into a vector. Whether they’re referenced to the original or not doesn’t seem to matter. They both produce the same result.

void OptixTracer::buildAccelStructure()
{
    // Stack all the build inputs into an array
    _buildInputArray.clear();
    for (auto [name, input] : _optixInputs)
        _buildInputArray.push_back(input);

    // Do full update because geometry was added
    bool _fullUpdate = true;

    // Set the options
    _accelBuildOptions = {};
#ifdef OPTIX_ALWAYS_BUILD_FULL_GAS
    _accelBuildOptions.buildFlags = OPTIX_BUILD_FLAG_NONE;
#else
    _fullUpdate = geometryWasUpdated();
    _accelBuildOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE;
#endif
    _accelBuildOptions.operation = _fullUpdate ? OPTIX_BUILD_OPERATION_BUILD : OPTIX_BUILD_OPERATION_UPDATE;

    // Calculate the GAS buffer sizes
    if (_fullUpdate || !_gasBuffersAllocated)
    {
        OPTIX_CHECK(
            optixAccelComputeMemoryUsage(
                _devContext,
                &_accelBuildOptions,
                _buildInputArray.data(),
                _buildInputArray.size(),
                &_gasBufferSizes
            )
        );

        CUDA_CHECK(
            cudaMalloc(
                reinterpret_cast<void**>(&_devGasTempBuffer),
                _gasBufferSizes.tempSizeInBytes
            )
        );

        CUDA_CHECK(
            cudaMalloc(
                reinterpret_cast<void**>(&_devGasOutputBuffer),
                _gasBufferSizes.outputSizeInBytes
            )
        );

        // Indicate that we already built the accleeration structure and allocate
        // the space needed to easily update
        _gasBuffersAllocated = true;
    }

    OPTIX_CHECK(
        optixAccelBuild(
            _devContext,
            _cuStream, // This is the CUDA stream
            &_accelBuildOptions,
            _buildInputArray.data(),
            _buildInputArray.size(),
            _devGasTempBuffer,
            _gasBufferSizes.tempSizeInBytes,
            _devGasOutputBuffer,
            _gasBufferSizes.outputSizeInBytes,
            &_gasHandle,
            nullptr,
            0
        )
    );
}

This is the source that builds the OptixBuildInput.

int OptixTracer::addGeometry(const std::string& _meshName, enum RTCGeometryType _geometryType, int _numVertices, int _numElements)
{
    // Just make buffers and build inputs
    _optixInputs[_meshName] = {};
    _optixInputs[_meshName].type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
    _optixInputs[_meshName].triangleArray.flags = _buildInputFlags;
    _optixInputs[_meshName].triangleArray.numSbtRecords = 1;
    _optixInputs[_meshName].triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
    _optixInputs[_meshName].triangleArray.vertexStrideInBytes = sizeof(float3);
    _optixInputs[_meshName].triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
    _optixInputs[_meshName].triangleArray.indexStrideInBytes = sizeof(uint3);
    _optixInputs[_meshName].triangleArray.numVertices = static_cast<uint32_t>(_numVertices);
    _optixInputs[_meshName].triangleArray.numIndexTriplets = static_cast<uint32_t>(_numElements);

    // Allocate space
    const size_t vertexSize = sizeof(float3);
    const size_t elementSize = sizeof(int3);

    // Allocate space in RAM calling in-place constructor
    _vertices.emplace(
        std::piecewise_construct,
        std::forward_as_tuple(_meshName),
        std::forward_as_tuple(_numVertices)
    );
    _elements.emplace(
        std::piecewise_construct,
        std::forward_as_tuple(_meshName),
        std::forward_as_tuple(_numElements)
    );
    
    // Insert key into device vertices and elements for storing GPU space
    _devVertices.emplace(
        std::piecewise_construct,
        std::forward_as_tuple(_meshName),
        std::forward_as_tuple(0)
    );
    _devElements.emplace(
        std::piecewise_construct,
        std::forward_as_tuple(_meshName),
        std::forward_as_tuple(0)
    );
    
    // Allocate the actual space on the GPU for vertices and elements
    CUDA_CHECK(
        cudaMalloc(
            reinterpret_cast<void **>(&_devVertices[_meshName]),
            static_cast<size_t>(_numVertices * vertexSize)
        )
    );
    CUDA_CHECK(
        cudaMalloc(
            reinterpret_cast<void **>(&_devElements[_meshName]),
            static_cast<size_t>(_numElements * elementSize)
        )
    );

    // Set allocate device data to location on optix build input
    _optixInputs[_meshName].triangleArray.vertexBuffers = &_devVertices[_meshName];
    _optixInputs[_meshName].triangleArray.indexBuffer = _devElements[_meshName];

    CUDA_SYNC_CHECK();

    // Count the geometries to know how much space is needed
    setGeometryCount(getGeometryCount() + 1);

    // Add or remove geometry event
    _geometryWasUpdated.store(true);

    return 0;
}

For completeness I’ll include the SBT setup method.

void OptixTracer::setupSbtRecords()
{
    // Raygen SBT record
    const size_t raygenRecordSize = sizeof(RayGenSbtRecord);
    CUDA_CHECK(
        cudaMalloc(
            reinterpret_cast<void **>(&_devRaygenSbtRecord),
            raygenRecordSize
        )
    );
    OPTIX_CHECK(
        optixSbtRecordPackHeader(
            _raygenProgramGroup,
            &_raygenSbtRecord
        )
    );
    CUDA_CHECK(
        cudaMemcpy(
            reinterpret_cast<void *>(_devRaygenSbtRecord),
            &_raygenSbtRecord,
            raygenRecordSize,
            cudaMemcpyHostToDevice
        )
    );

    // Miss SBT record
    const size_t missRecordSize = sizeof(MissSbtRecord);
    CUDA_CHECK(
        cudaMalloc(
            reinterpret_cast<void **>(&_devMissSbtRecord),
            missRecordSize
        )
    );
    OPTIX_CHECK(
        optixSbtRecordPackHeader(
            _missProgramGroup,
            &_missSbtRecord
        )
    );
    CUDA_CHECK(
        cudaMemcpy(
            reinterpret_cast<void *>(_devMissSbtRecord),
            &_missSbtRecord,
            missRecordSize,
            cudaMemcpyHostToDevice
        )
    );

    // Hit group SBT record
    const size_t hitGroupRecordSize = sizeof(HitGroupSbtRecord);
    CUDA_CHECK(
        cudaMalloc(
            reinterpret_cast<void **>(&_devHitgroupSbtRecord),
            hitGroupRecordSize
        )
    );
    OPTIX_CHECK(
        optixSbtRecordPackHeader(
            _hitgroupProgramGroup,
            &_hitgroupSbtRecord
        )
    );
    CUDA_CHECK(
        cudaMemcpy(
            reinterpret_cast<void *>(_devHitgroupSbtRecord),
            &_hitgroupSbtRecord,
            hitGroupRecordSize,
            cudaMemcpyHostToDevice
        )
    );

    // Fill out SBT structure
    _shaderBindingTable.raygenRecord = _devRaygenSbtRecord;
    _shaderBindingTable.missRecordBase = _devMissSbtRecord;
    _shaderBindingTable.missRecordStrideInBytes = sizeof(MissSbtRecord);
    _shaderBindingTable.missRecordCount = 1;
    _shaderBindingTable.hitgroupRecordBase = _devHitgroupSbtRecord;
    _shaderBindingTable.hitgroupRecordStrideInBytes = sizeof(HitGroupSbtRecord);
    _shaderBindingTable.hitgroupRecordCount = 1;
}

Just note that addGeometry adds the geometry and allocates the space in RAM and GPU memory, but does not put actual data points in the structure. Before the above acceleration build function is called another function places all the necessary points in the structure so the geometry is there for the build.

This all works very well with only one OptixBuildInput, i.e. just one of the _optixInputs[meshString] given as input.

Any insight would be greatly appreciated. I’m at a dead end.

I see the discussion in the programming guide for 7.6 that shows how to find the SBT record for the given build input. But it doesn’t seem to indicate what the records correspond to and how much to allocate. I tried to allocate as many SBT records as there are build inputs but this just gave me a strange access violation at runtime that cited a cudaFree call.

I tried to allocate as many SBT records as there are build inputs but this just gave me some access violation at runtime.

That’s the correct thing to do and should have worked if all allocations are properly sized, aligned and initialized.

The OptiX Programming Guide says in chapter 5.1 Primitive Build inputs that “each build input maps to one or more consecutive records in the shader binding table (SBT)”, so in your case each build input maps to its own SBT hit record, not the same.

According to the error message invalid hit sbt of 1 at primitive with gas sbt index 1 the renderer tries to access more than one SBT hit record for that GAS.

You’d need to allocate _buildInputArray.size() many SBT hit records and initialize each with the same hit group record from optixSbtRecordPackHeader if the sub-meshes should not have different materials.

The only example code inside the OptiX SDK 7.7.0 I found which does this is in utils/Scene.cpp.
See line 909: std::vector<OptixBuildInput> buildInputs( num_subMeshes );
and line 940: triangle_input.triangleArray.numSbtRecords = 1;
The SBT is built at the very end of that source and builds hit records per instance per sub-mesh times ray type count (hitgroup_records.size()).

If that is not working, a minimal complete reproducer would be required to investigate this.
Please provide the usual system configuration information along with it.
OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

1 Like

Thanks for the reply. I’m guessing I didn’t set up the OptixBuildInput SBT parameters exactly as they should have been. I’ll have a look at Scene.cpp section you mentioned. I was thrilled to get the whole thing running using a single mesh and was sure I just had a minor setup problem when the second mesh broke the process.

I’ll try allocating the two SBT records again and make sure I’m pointing the two meshes at them.

Again, much appreciated.

I’m just stopping by to give an update on the situation. I realized before when I tried to fix the SBT records setup that I had allocated the space but didn’t copy the same packed header data into the allocated space for both OptixBuildInputs. In short, it works very nicely now. It also doesn’t produce any of the smearing effects that I was seeing before making these fixes. So there must have been some memory getting corrupted and displaced in my viewer.

I’m finally starting to get a better feel for how the SBT records work. Thanks again.

Nice!

Yes, the SBT is a very flexible thing but in the end it boils down to the sbt-index formula in chapter 7.3. of the programming guide.
That gives you all input values with which you can control the final SBT index used for an intersection.
When using instances over GAS you have additional values to control that. Some care needs to be taken of the instance SBT offsets when using more than one SBT record per GAS which affects the SBT offset of later instances.

In my OptiX 7 examples I use different SBT layouts. All GAS use only one SBT entry and the scene hierarchy is always a single level instance (IAS-> GAS), so the additional fields for the SBT index calculation are available.
The earlier examples have an SBT hit record per instance and store geometry attribute data pointers inside the SBT hit record data, the later ones use one SBT hit record per material shader and access the geometry attribute pointers via the user-defined instance ID.
The rtigo10 example shows the smallest possible SBT among these because that doesn’t even need anyhit programs due to a faster shadow ray implementation possible when not supporting cutout opacity, which then doesn’t need SBT hit records for the shadow ray type at all, only a miss program.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.