Stack size calculation for ProgramGroups (RG, EX, MS, HG) with optixUtilAccumulateStackSizes

Hi, I am new to Optix and currently using OptiX 9 for ray tracing development, starting with simple case. Below is the code lines to calculate the stack sizes for the ProgramGroups.

    // Stack size calculation
    OptixStackSizes stack_sizes = {};
    //for (auto& prog_group : _programGroups) {
    //    OPTIX_CHECK(optixUtilAccumulateStackSizes(
    //        prog_group, &stack_sizes, _pipeline));
    //}
    OPTIX_CHECK(optixUtilAccumulateStackSizes(
        pg_rg_firstIllumination,
        &stack_sizes,
        _pipeline
    ));
    OPTIX_CHECK(optixUtilAccumulateStackSizes(
        pg_exception,
        &stack_sizes,
        _pipeline
    ));
    OPTIX_CHECK(optixUtilAccumulateStackSizes(
        pg_miss_shadow,
        &stack_sizes,
        _pipeline
    ));
    OPTIX_CHECK(optixUtilAccumulateStackSizes(
        pg_hitgroup_default,
        &stack_sizes,
        _pipeline
    ));

When I debug, the stack size for RG has value and the rest are zeros as per below.

When I call optixUtilComputeStackSizes with those values of stack_size, I get as below.

image

Is there anything wrong since I also have EX, MS and HG programs but only counted for RG? If so, please assist how to correct it.

Is it a reason my code, not reaching to miss or anyhit or closesthit even though the condition is met (I think), when optixTrace is called from raygen? That is my main issue encountered now.

Thank you. Appreciated for your kind response and advice.

Hi @zawzawoo.ko, welcome!

It’s normal to have zeroes in your stack sizes sometimes. As long as you’ve accounted for all the program groups in your pipeline, what you’re doing so far looks correct to me.

If you have programs that are not getting called, that sounds potentially like an issue with the Shader Binding Table. If you have multiple entries in your SBT, it might be worth simplifying down to 1 entry for debugging/testing and make sure that works before adding multiple ray types and multiple materials, etc.

–
David.

Hi David,

Thank you for your kind reply. Let me check the SBT records and update my issue later.

Hi David,

I simplified the SBT to one entry for RG, MS, HG (anyhit & closesthit). The ray checking is called from raygen as below.

setPayloadPtr(&ray_prd, p0, p1);

optixTrace(params.traversable,
    triCtr, dir,
    tmin, tmax, 0.f, 255 /*ray.visibilityMask*/, flags,
    params.sbtOffset, params.sbtStride, params.shadowMissIndex,
    p0, p1);

where,

const unsigned int flags = OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT;    
params.sbtOffset = 0;      
params.sbtStride = 1;
params.shadowMissIndex = 0;

It seems like the optixTrace does not reach either MS or HG since I put printf in miss anyhit closesthit programs and nothing print to screen.

Then, applying different angles of the incident ray, then I get the following error.

Could you please advise what possible mistake I made?

Thanks again.

Hi @zawzawoo.ko, I can’t tell without seeing the whole program, but that error seems to indicate you might be juggling multiple pipelines, and the one you’re using doesn’t have the module you intended/compiled? If you created a new separate pipeline, for example, in order to reduce down to 1 SBT entry, and kept the rest of your code around that is trying to use multiple SBT entries, I can see how that could easily lead to a crossed wire somewhere.

I know it can be a little confusing at first. Maybe a nice way to learn this initially is to start with one of the working SDK samples and delete or add a material, or add a new ray type and get it working? The optixSphere sample is nice and simple - only 1 MS and CH program. You could compare the module/SBT/pipeline setup in optixSphere to your project.

The optixWhitted and optixPathTracer samples are also good ones to play with, break, and reassemble if you want to either reduce them to 1 material or practice adding new materials or new objects to the scene.

–
David.

Hi David, thanks for your kind explanation in details. Actually, I created just one pipeline, may be somewhere mess up with the SBT records to the MS, AH, CH programs those compiled to the pipeline, as you mentioned in earlier reply.

Currently, my codes is in messy, and I will reorg them and also learn from the working SDK samples as you suggested.

Will update here. Thanks again.

Hi David,

Update for the issue that optixTrace call, didn’t reach to respective programs.

Something not right in the following struct I used for CUdeviceptr declare, cudaMalloc, cudaMemcpy, and get_raw_pointer when I was creating the SBT records for the programs.

// Helper for managing a CUDA device buffer
struct CudaBuffer
{
    // The raw CUDA device pointer, initialized to 0.
    CUdeviceptr d_ptr = 0;
    // The size of the allocated buffer in bytes.
    size_t size_in_bytes = 0;

    // Default constructor
    CudaBuffer() = default;

    // Destructor - this is the core of RAII.
    ~CudaBuffer()
    {
        free();
    }

    // Allocates a buffer of a given size in bytes.
    void alloc(size_t size)
    {
        if (d_ptr != 0) {
            free();
        }
        this->size_in_bytes = size;
        CUDA_CHECK(cudaMalloc(reinterpret_cast<void**>(&d_ptr), size_in_bytes));
    }

    // Frees the allocated buffer.
    void free()
    {
        if (d_ptr != 0) {
            CUDA_CHECK(cudaFree(reinterpret_cast<void*>(d_ptr)));
            d_ptr = 0;
            size_in_bytes = 0;
        }
    }

    // A templated helper to copy data from the host (CPU) to this device (GPU) buffer.
    template<typename T>
    void copyToDevice(const T* host_source, size_t count)
    {
        CUDA_CHECK(cudaMemcpy(
            reinterpret_cast<void*>(d_ptr),
            host_source,
            count * sizeof(T),
            cudaMemcpyHostToDevice
        ));
    }

    // Returns the raw CUdeviceptr.
    CUdeviceptr get_raw_pointer() const
    {
        return d_ptr;
    }

    // --- OPTIONAL BUT USEFUL HELPER ---
    // A templated getter to return a typed pointer.
    // Useful for passing the buffer to your own CUDA kernels.
    template<typename T>
    T* get_typed_pointer() const
    {
        return reinterpret_cast<T*>(d_ptr);
    }
};

For example,

    CudaBuffer d_missRecord_shadow;
    d_missRecord_shadow.alloc(sizeof(MissSbtRecord));
    MissSbtRecord ms_sbt_shadow = {};
    OPTIX_CHECK(optixSbtRecordPackHeader(pg_miss_shadow, &ms_sbt_shadow));
    d_missRecord_shadow.copyToDevice(&ms_sbt_shadow, 1);

    _sbt.missRecordBase = d_missRecord_shadow.get_raw_pointer(); // CUdeviceptr
    _sbt.missRecordStrideInBytes = sizeof(MissSbtRecord);
    _sbt.missRecordCount = 1;

Then, it is changed to the simple way as in the sample examples.

    CUdeviceptr miss_record;
    size_t      miss_record_size = sizeof(MissSbtRecord);
    CUDA_CHECK(cudaMalloc(reinterpret_cast<void**>(&miss_record), miss_record_size));
    MissSbtRecord ms_sbt;
    OPTIX_CHECK(optixSbtRecordPackHeader(pg_miss_shadow, &ms_sbt));
    CUDA_CHECK(cudaMemcpy(
        reinterpret_cast<void*>(miss_record),
        &ms_sbt,
        miss_record_size,
        cudaMemcpyHostToDevice
    ));

    _sbt.missRecordBase = miss_record; // CUdeviceptr
    _sbt.missRecordStrideInBytes = sizeof(MissSbtRecord);
    _sbt.missRecordCount = 1;

Now it works and run as intended to, I can check which program is called from optixTrace.

I have some questions, whether it can create different SBT records for different RG, MS, HG programs, like sets of SBT records, and use individual set during optixLaunch? Does optixTrace will call respective programs assigned to that particular SBT records set?

Thanks again.

Great to hear you got it working, thanks for the update!

I’m not sure I understand the question, but I think the answer is that the SBT setup is up to you. If you have different sets of shader programs you want to use, but only one set at a time, then there are multiple ways you might organize your SBT. Of course one way is to create multiple separate SBTs, if you only use one set during a single optixLaunch. Another way is to put each set into a single SBT at a constant offset, and then use the sbtOffset and missSbtIndex parameters to your trace call to point at a different group. You could also interleave your different sets in the SBT. For example, the OptiX Programming Guide and SDK samples use a ‘ray type’ for different kinds of rays. The most common ray type distinction in our samples is for “radiance rays” vs “shadow rays”, where radiance rays carry color and shadow rays are only for testing visibility. You could choose to have different SBT sets organized as new ray types, and this might be convenient and make following our examples easier. For example if you also have radiance rays and shadow rays, as well as 3 different sets of SBT programs, you could pretend there are 6 ray types (set 1 radiance, set 1 shadow, set 2 radiance, 
) and then pick the ray type based on which set you’re using.

Essentially the SBT is just an array and there’s a simple formula for calculating the array index for any given hit. This means you have a lot of control and flexibility in how you organize, use, and update your SBT. In case you haven’t seen it, there’s a blog post detailing a couple of different ways to organize your SBT, including how to store and index your per-instance data separately from your SBT: Efficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization | NVIDIA Technical Blog

–
David.

Thanks a lot, David. Your explanation exactly addresses to my question. I have tried with two separate SBT sets for radiance and shadow programgroups, and used only one set during optixLaunch. Then optixTrace calls the respective programs correctly. It works.

Previously, I set up the single SBT and used the sbtOffet and missSbtIndex parameters in optixTrace call. Also I read the technical blog post you mentioned. I got the issue mentioned earlier (programs not been called from optixTrace). I thought I didn’t get fully understand about sbtOffset and missSbtIndex linked to the right programs, probably I don’t capture the concept fully. Then, the actual problem is with the CudaBuffer struct I used.

As I changed to simple steps of creating the SBT records based on the sample example, it is working.

Thanks again.

1 Like