Beginner with simple geometry program problems

Hi, I am a student trying to get started with OptiX 8.0.0 for a project that I am working on. I am basically trying to make my own program that should produce a 2D plane to get started. So far I am basically just following instructions on the internet and using example code, which probably makes my program very difficult to understand. Anyways, I thought that I’de post my problem(s) and see if anyone could help me out.

So far I have put together the code below and it builds fine. However, I get this error when I try to run it:

Thank you for any suggestions or tips on how I should to forward. :)

[ACCEL] Invalid value (0) for “buildInputs[0].triangleArray.flags”
[ERROR] outputBuffer is 0

[COMPILER]
[PIPELINE] params variable “params” not found in any module. It might have been optimized away.
[COMPILER] Info: Pipeline statistics
module(s) : 3
entry function(s) : 3
trace call(s) : 0
continuation callable call(s) : 0
direct callable call(s) : 0
basic block(s) in entry functions : 3
instruction(s) in entry functions : 48
non-entry function(s) : 0
basic block(s) in non-entry functions: 0
instruction(s) in non-entry functions: 0
debug information : no

[DISKCACHE] Closed database: “C:\Users\PC\AppData\Local\NVIDIA\OptixCache\optix7cache.db”
[DISKCACHE] Cache data size: “33.2 MiB”

C:\Users\PC\Desktop\new_example_4\build\bin\Debug\new_example_4.exe (process 22060) exited with code 0.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.

src code:

include <optix.h>
include <cuda_runtime.h>
include <optix_stubs.h>
include <optix_function_table_definition.h>
include
include <…/inc/exception.h>
include <optix_types.h>
include <sutil/sutil.h>

include
include
include

// Initialize OptiX and create context
void initOptix(CUcontext& cuContext, CUstream& stream) {
// Initialize CUDA
CUdevice cuDevice;
cuInit(0);
cuDeviceGet(&cuDevice, 0); // Assumes device 0 is suitable
cuCtxCreate(&cuContext, 0, cuDevice);

// Create a CUDA stream
cuStreamCreate(&stream, CU_STREAM_DEFAULT);

// Initialize OptiX
optixInit();

}

// Vertex structure for a simple 2D plane
struct Vertex {
float x, y, z; // Position
};

// Define vertices of a square plane centered at the origin
Vertex vertices = {
{-0.5f, -0.5f, 0.0f}, // Bottom left
{0.5f, -0.5f, 0.0f}, // Bottom right
{0.5f, 0.5f, 0.0f}, // Top right
{-0.5f, 0.5f, 0.0f} // Top left
};

// Define indices for two triangles that make up the square
unsigned int indices = {
0, 1, 2, // First triangle
2, 3, 0 // Second triangle
};

// Load PTX files path
const char* ptxPathRayGen = “C:/Users/PC/Desktop/new_example_4/cuda/cuda_output/rayGen.ptx”;
const char* ptxPathhit = “C:/Users/PC/Desktop/new_example_4/cuda/cuda_output/hit.ptx”;
const char* ptxPathMiss = “C:/Users/PC/Desktop/new_example_4/cuda/cuda_output/miss.ptx”;

OptixTraversableHandle createAccelerationStructure(OptixDeviceContext context, CUdeviceptr& d_vertices, CUdeviceptr& d_indices, CUdeviceptr& d_tempBufferGas, CUdeviceptr& d_gasOutputBuffer) {
// Assume ‘vertices’ and ‘indices’ are already defined as shown above

size_t vertices_size = sizeof(vertices);
size_t indices_size = sizeof(indices);

// Allocate device memory for vertices and indices
cuMemAlloc(&d_vertices, vertices_size);
cuMemAlloc(&d_indices, indices_size);

// Copy vertices and indices to device memory
cuMemcpyHtoD(d_vertices, vertices, vertices_size);
cuMemcpyHtoD(d_indices, indices, indices_size);

// Define build input
OptixBuildInput buildInput = {};
buildInput.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;

// Setup vertex buffer
OptixBuildInputTriangleArray triangleArray = {};
triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
triangleArray.vertexStrideInBytes = sizeof(Vertex);
triangleArray.numVertices = 4;
triangleArray.vertexBuffers = &d_vertices;

// Setup index buffer
triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
triangleArray.indexStrideInBytes = sizeof(unsigned int) * 3;
triangleArray.numIndexTriplets = 2;
triangleArray.indexBuffer = d_indices;

// Set the number of SBT records

triangleArray.numSbtRecords = 1; //<---- MAY BE WRONG!

buildInput.triangleArray = triangleArray;

// Specify build options
OptixAccelBuildOptions accelOptions = {};
accelOptions.buildFlags = OPTIX_BUILD_FLAG_NONE;
accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

// Output buffers for acceleration structure
OptixAccelBufferSizes gasBufferSizes;
optixAccelComputeMemoryUsage(context, &accelOptions, &buildInput, 1, &gasBufferSizes);

// Allocate memory for acceleration structure
cuMemAlloc(&d_tempBufferGas, gasBufferSizes.tempSizeInBytes);
cuMemAlloc(&d_gasOutputBuffer, gasBufferSizes.outputSizeInBytes);

// Build acceleration structure
OptixTraversableHandle gasHandle = 0;
optixAccelBuild(context, 0, &accelOptions, &buildInput, 1,
    d_tempBufferGas, gasBufferSizes.tempSizeInBytes,
    d_gasOutputBuffer, gasBufferSizes.outputSizeInBytes,
    &gasHandle, nullptr, 0);

// Return the handle to the created acceleration structure
return gasHandle;

// Do not forget to free the allocated buffers after they are no longer needed

}

// Function to load the contents of a PTX file into a string
std::string loadPtx(const std::string& filepath) {
std::ifstream file(filepath.c_str());
if (!file) {
std::cerr << "Failed to open PTX file: " << filepath << std::endl;
return “”;
}

std::stringstream buffer;
buffer << file.rdbuf();
return buffer.str();

}

int main() {
// Step 1: Setup OptiX context (for OptiX 7+, prepare to use CUDA directly)
CUcontext cuContext;
CUstream stream;
OptixDeviceContext optixContext = nullptr;

// Initialize OptiX and CUDA context
initOptix(cuContext, stream);

CUdeviceptr d_vertices = 0;
CUdeviceptr d_indices = 0;
CUdeviceptr d_tempBufferGas = 0;
CUdeviceptr d_gasOutputBuffer = 0;

// Create an OptiX device context using the CUDA context
OPTIX_CHECK(optixDeviceContextCreate(cuContext, nullptr, &optixContext));
OPTIX_CHECK(optixDeviceContextSetLogCallback(optixContext, [](unsigned int level, const char* tag, const char* message, void* cbdata) {
    std::cerr << "[" << tag << "] " << message << "\n";
    }, nullptr, 4)); // LogLevel=4 for verbosity

// Step 2: Define geometry and create acceleration structures
OptixTraversableHandle gasHandle = createAccelerationStructure(optixContext, d_vertices, d_indices, d_tempBufferGas, d_gasOutputBuffer);


// Module and Program Group Compile Options
OptixModuleCompileOptions moduleCompileOptions = {};
moduleCompileOptions.maxRegisterCount = OPTIX_COMPILE_DEFAULT_MAX_REGISTER_COUNT;
moduleCompileOptions.optLevel = OPTIX_COMPILE_OPTIMIZATION_DEFAULT;
moduleCompileOptions.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_NONE;

OptixPipelineCompileOptions pipelineCompileOptions = {};
pipelineCompileOptions.usesMotionBlur = false;
pipelineCompileOptions.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING;
pipelineCompileOptions.numPayloadValues = 2; // Adjust based on your needs
pipelineCompileOptions.numAttributeValues = 2; // Adjust based on your needs
pipelineCompileOptions.exceptionFlags = OPTIX_EXCEPTION_FLAG_NONE; // or other flags as necessary
pipelineCompileOptions.pipelineLaunchParamsVariableName = "params";

// Step 3: Create shaders (ray generation, hit, miss)
char log[2048]; // For logging
size_t logSize = sizeof(log);



// Load the PTX source code
std::string rayGenPTX = loadPtx(ptxPathRayGen);
OptixModule rayGenModule;
OPTIX_CHECK(optixModuleCreate(
    optixContext,
    &moduleCompileOptions,
    &pipelineCompileOptions,
    rayGenPTX.c_str(), rayGenPTX.size(),
    log, &logSize,
    &rayGenModule
));


std::string hitPTX = loadPtx(ptxPathhit);
OptixModule hitModule;
OPTIX_CHECK(optixModuleCreate(
    optixContext,
    &moduleCompileOptions,
    &pipelineCompileOptions,
    hitPTX.c_str(), hitPTX.size(),
    log, &logSize,
    &hitModule
));

std::string missPTX = loadPtx(ptxPathMiss);
OptixModule missModule;
OPTIX_CHECK(optixModuleCreate(
    optixContext,
    &moduleCompileOptions,
    &pipelineCompileOptions,
    missPTX.c_str(), missPTX.size(),
    log, &logSize,
    &missModule
));


// Define program group options - typically, this is left as default for basic usage
OptixProgramGroupOptions programGroupOptions = {};


// Create program group for ray generation shader
OptixProgramGroupDesc raygenPGDesc = {};
raygenPGDesc.kind = OPTIX_PROGRAM_GROUP_KIND_RAYGEN;
raygenPGDesc.raygen.module = rayGenModule; // Ensure rayGenModule is correctly initialized
raygenPGDesc.raygen.entryFunctionName = "__raygen__rg";

OptixProgramGroup raygenProgramGroup;
OPTIX_CHECK(optixProgramGroupCreate(
    optixContext,
    &raygenPGDesc,
    1, // Number of program group descriptions
    &programGroupOptions, // Corrected to use program group options
    log, &logSize, // Log buffer and its size
    &raygenProgramGroup // The created program group
));

// Create program group for closest hit shader
OptixProgramGroupDesc hitPGDesc = {};
hitPGDesc.kind = OPTIX_PROGRAM_GROUP_KIND_HITGROUP;
hitPGDesc.hitgroup.moduleCH = hitModule; // Use your hit module here
hitPGDesc.hitgroup.entryFunctionNameCH = "__closesthit__ch"; // Entry point for your hit shader

OptixProgramGroup hitProgramGroup;
OPTIX_CHECK(optixProgramGroupCreate(
    optixContext,
    &hitPGDesc, // Use the hit program group descriptor
    1, // One program group description
    &programGroupOptions, // Assuming programGroupOptions is already defined
    log, &logSize, // Log buffer and size
    &hitProgramGroup // The created program group
));

// Create program group for miss shader
OptixProgramGroupDesc missPGDesc = {};
missPGDesc.kind = OPTIX_PROGRAM_GROUP_KIND_MISS;
missPGDesc.miss.module = missModule; // Ensure missModule is correctly initialized
missPGDesc.miss.entryFunctionName = "__miss__ms";

OptixProgramGroup missProgramGroup;
OPTIX_CHECK(optixProgramGroupCreate(
    optixContext,
    &missPGDesc,
    1, // Number of program group descriptions
    &programGroupOptions, // Program group options
    log, &logSize, // Log buffer and its size
    &missProgramGroup // The created program group
));



// Step 4: Setup ray tracing pipeline
OptixPipeline pipeline;

OptixPipelineLinkOptions pipelineLinkOptions = {};
pipelineLinkOptions.maxTraceDepth = 2;
OptixProgramGroup programGroups[] = { raygenProgramGroup, hitProgramGroup, missProgramGroup };

OPTIX_CHECK(optixPipelineCreate(
    optixContext,
    &pipelineCompileOptions,
    &pipelineLinkOptions,
    programGroups,
    sizeof(programGroups) / sizeof(programGroups[0]), // Number of program groups
    log, &logSize,
    &pipeline
));

// Set stack sizes
uint32_t directCallableStackSizeFromTraversal = 64; // Minimal if not using direct callables
uint32_t directCallableStackSizeFromState = 64; // Minimal if not using direct callables
uint32_t continuationStackSize = 1024; // Estimate based on the complexity of your shaders

// The maximum depth of the traversable graph for ray tracing
uint32_t maxTraversableGraphDepth = 2; // Assuming a simple scene

OPTIX_CHECK(optixPipelineSetStackSize(
    pipeline,
    directCallableStackSizeFromTraversal,
    directCallableStackSizeFromState,
    continuationStackSize,
    maxTraversableGraphDepth // Maximum depth of traversal
));


// Step 5: Render the scene

struct __align__(OPTIX_SBT_RECORD_ALIGNMENT) RayGenSbtRecord {
    __align__(OPTIX_SBT_RECORD_HEADER_SIZE) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
    // Add ray generation data here
    //OptixProgramGroup raygenProgram;
};

struct __align__(OPTIX_SBT_RECORD_ALIGNMENT) MissSbtRecord {
    __align__(OPTIX_SBT_RECORD_HEADER_SIZE) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
    // Add miss data here
    //OptixProgramGroup missProgram;
};

struct __align__(OPTIX_SBT_RECORD_ALIGNMENT) HitGroupSbtRecord {
    __align__(OPTIX_SBT_RECORD_HEADER_SIZE) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
    // Add hit group data here
    //OptixProgramGroup closestHitProgram;
};

RayGenSbtRecord rgSbt;
MissSbtRecord msSbt;
HitGroupSbtRecord hgSbt;

optixSbtRecordPackHeader(raygenProgramGroup, &rgSbt);
optixSbtRecordPackHeader(missProgramGroup, &msSbt);
optixSbtRecordPackHeader(hitProgramGroup, &hgSbt);

// Allocate and copy SBT records to device memory
CUdeviceptr d_raygenRecords, d_missRecords, d_hitgroupRecords;
size_t sbtSize = sizeof(RayGenSbtRecord); // Same for Miss and HitGroup if they don't have additional data
cudaMalloc(reinterpret_cast<void**>(&d_raygenRecords), sbtSize);
cudaMalloc(reinterpret_cast<void**>(&d_missRecords), sbtSize);
cudaMalloc(reinterpret_cast<void**>(&d_hitgroupRecords), sbtSize);

cudaMemcpy(reinterpret_cast<void*>(d_raygenRecords), &rgSbt, sbtSize, cudaMemcpyHostToDevice);
cudaMemcpy(reinterpret_cast<void*>(d_missRecords), &msSbt, sbtSize, cudaMemcpyHostToDevice);
cudaMemcpy(reinterpret_cast<void*>(d_hitgroupRecords), &hgSbt, sbtSize, cudaMemcpyHostToDevice);

OptixShaderBindingTable sbt = {};
sbt.raygenRecord = d_raygenRecords;
sbt.missRecordBase = d_missRecords;
sbt.missRecordStrideInBytes = sizeof(MissSbtRecord);
sbt.missRecordCount = 1;
sbt.hitgroupRecordBase = d_hitgroupRecords;
sbt.hitgroupRecordStrideInBytes = sizeof(HitGroupSbtRecord);
sbt.hitgroupRecordCount = 1;

struct LaunchParams {
    int width;
    int height;
    CUdeviceptr outputBuffer;
    // Add other parameters as needed
};

LaunchParams params = { 800, 600 }; // Example dimensions
CUdeviceptr d_params;
cudaMalloc(reinterpret_cast<void**>(&d_params), sizeof(LaunchParams));
cudaMemcpy(reinterpret_cast<void*>(d_params), &params, sizeof(LaunchParams), cudaMemcpyHostToDevice);

CUdeviceptr d_outputBuffer;
cudaMalloc(reinterpret_cast<void**>(&d_outputBuffer), params.width* params.height * sizeof(float4)); // Assuming float4 per pixel
params.outputBuffer = d_outputBuffer;
// Update params on the device
cudaMemcpy(reinterpret_cast<void*>(d_params), &params, sizeof(LaunchParams), cudaMemcpyHostToDevice);

optixLaunch(
    pipeline,
    stream,
    d_params,
    sizeof(LaunchParams),
    &sbt,
    params.width,  // launch width
    params.height, // launch height
    1              // launch depth
);
cudaStreamSynchronize(stream); // Wait for completion


float4* h_outputBuffer = new float4[params.width * params.height];
cudaMemcpy(h_outputBuffer, reinterpret_cast<void*>(d_outputBuffer), params.width* params.height * sizeof(float4), cudaMemcpyDeviceToHost);

// Now, `h_outputBuffer` contains your rendered image. You can save it to a file or display it.

// Step 6: Cleanup and free resources
cudaFree(reinterpret_cast<void*>(d_raygenRecords));
cudaFree(reinterpret_cast<void*>(d_missRecords));
cudaFree(reinterpret_cast<void*>(d_hitgroupRecords));
cudaFree(reinterpret_cast<void*>(d_outputBuffer));
cudaFree(reinterpret_cast<void*>(d_params));

OPTIX_CHECK(optixProgramGroupDestroy(raygenProgramGroup));
OPTIX_CHECK(optixProgramGroupDestroy(hitProgramGroup));
OPTIX_CHECK(optixProgramGroupDestroy(missProgramGroup));
OPTIX_CHECK(optixModuleDestroy(rayGenModule));
OPTIX_CHECK(optixModuleDestroy(hitModule));
OPTIX_CHECK(optixModuleDestroy(missModule));


cuMemFree(d_vertices);
cuMemFree(d_indices);
cuMemFree(d_tempBufferGas);
cuMemFree(d_gasOutputBuffer);

OPTIX_CHECK(optixDeviceContextDestroy(optixContext));
cuStreamDestroy(stream);
cuCtxDestroy(cuContext);

return 0;

}

[ACCEL] Invalid value (0) for “buildInputs[0].triangleArray.flags”

That’s because you have not set that flags pointer inside your code.
https://raytracing-docs.nvidia.com/optix8/api/struct_optix_build_input_triangle_array.html#a62da71e3d7992526ed1935001e78fb49
It must be a pointer to an array of OptixGeometryFlags, one per SBT record.
Something like this: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L1376

[PIPELINE] params variable “params” not found in any module. It might have been optimized away.

You told OptiX that the name of your launch parameter block in constant memory is named params with this field inside the OptixPipelineCompile options:
pipelineCompileOptions.pipelineLaunchParamsVariableName = "params";

Then you need to have a declaration for that params variable with your launch parameter structure.
Something like this, just with your structure and variable names:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/shaders/raygeneration.cu#L40

Please do not set some fixed values for the stack sizes. Do the proper calculation to use the minimal necessary amount.
The OptiX SDK contains helper functions for that inside the optix_stack_size.h header.
I rolled my own before that header existed:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L932

You set pipelineCompileOptions.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING; which is good because that is the fastest render graph layout on RTX boards, but you’re only calling optixAccelBuild once and build a geometry AS only. There needs to be some optixAccelBuild over OptixInstances for the top-level instance AS as well and the traversable handle of that should be your argument to optixTrace calls.
There was a recent discussion about that here: https://forums.developer.nvidia.com/t/question-about-instance-acceleartion-struction/283898

If you look into the examples inside that github repository, there are some simpler cases where some runtime generated geometries are built (plane, box, sphere, torus). I just added another example to that today.
https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410/4

1 Like

Hi droettger, thank you for the help, much appreciated!

I believe that I have now managed to figure out the problems I had with the flags pointer, pipline params and stack sizes. However, I don’t really understand what I should do given the information you gave me regarding piplineCompileOptions.traversableGraphFlags.

Furthermore, in order to visualize my “plane”, I am trying to implement a pinhole camera, perhaps this part is easier to implement by using an implementation from an already existing project, or a different approach? Basically what I have done is updated my rayGen program below. The project then builds without any problems but when I run it I get an exception error on this line:
OPTIX_CHECK(optixPipelineCreate(
optixContext,
&pipelineCompileOptions,
&pipelineLinkOptions,
programGroups,
sizeof(programGroups) / sizeof(programGroups[0]), // Number of program groups
log, &logSize,
&pipeline
));

With description:

Name Value Type
call 0x00007ff7b11d51a0 “optixPipelineCreate( optixContext, &pipelineCompileOptions, &pipelineLinkOptions, programGroups, sizeof(programGroups) / sizeof(programGroups[0]), log, &logSize, &pipeline )” const char *

rayGen.cu:
include <optix.h>
include <cuda_runtime.h>
include “…/sutil/vec_math.h” // Assuming this provides vector math operations

struct Camera {
float3 eye;
float3 lookAt;
float3 up;
float fov;
float aspectRatio;
};

struct LaunchParams {
Camera camera;
int width;
int height;
CUdeviceptr outputBuffer;
OptixTraversableHandle handle; // Acceleration structure handle
};

extern “C” {
constant LaunchParams params;
}

// Function to compute ray direction
static device float3 getRayDirection(const float2& pixelCoord, const float2& imageSize) {
float2 ndc = make_float2(pixelCoord.x / imageSize.x * 2.0f - 1.0f, (pixelCoord.y / imageSize.y * 2.0f - 1.0f) * -1.0f);
float3 forward = normalize(params.camera.lookAt - params.camera.eye);
float3 right = normalize(cross(params.camera.up, forward));
float3 up = cross(forward, right);
float tanFovY = tanf(params.camera.fov * 0.5f * M_PIf / 180.0f);
float aspectRatio = params.camera.aspectRatio;
float3 rayDir = normalize(forward + right * ndc.x * aspectRatio * tanFovY + up * ndc.y * tanFovY);
return rayDir;
}

extern “C” global void __raygen__rg() {
const uint3 idx = optixGetLaunchIndex();
const float2 imageSize = make_float2(params.width, params.height);
const float2 pixelCoord = make_float2(idx.x, idx.y);

float3 rayOrigin = params.camera.eye;
float3 rayDir = getRayDirection(pixelCoord, imageSize);

float3 resultColor = make_float3(0.0f); // Initialize result color

unsigned int p0, p1;
p0 = p1 = 0; // Initialize payload values

// Pack the address of resultColor into the payload
optixTrace(
    params.handle, // The traversable handle for the scene
    rayOrigin,     // Ray origin
    rayDir,        // Ray direction
    0.0f,          // tmin, start of ray segment
    1e20f,         // tmax, end of ray segment
    0.0f,          // rayTime, for motion blur
    OptixVisibilityMask(1), // visibilityMask
    OPTIX_RAY_FLAG_NONE,    // rayFlags
    0,             // SBT offset
    0,             // SBT stride
    0,             // missSBTIndex
    p0, p1         // Payload variables passed by reference
);

// Write the result to the output buffer
const unsigned int imageIndex = idx.y * params.width + idx.x;
float4* outputBuffer = reinterpret_cast<float4*>(params.outputBuffer);
outputBuffer[imageIndex] = make_float4(resultColor, 1.0f);

}

If you have error reports from OptiX like that invalid value, please set a log callback function and enable validation mode like this:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L296
That will normally result in better explanations what value is incorrect.

I’m always using __forceinline__ __device__ for all my OptiX device code functions which are not program entry points or callable programs. That way the compiler will do the right thing. __inline__is not enough.
In your code the getRayDirection() should do that.

However, I don’t really understand what I should do given the information you gave me regarding piplineCompileOptions.traversableGraphFlags.

OptiX supports more render graph hierarchies than DXR and Vulkan RT.
It allows to trace rays against

  • a single geometry acceleration structure (GAS)
    as used in many OptiX SDK examples. (None of my examples does that.)
  • a two-level AS structure with one instance AS (IAS) over many GAS
    which is the fastest option on RTX boards and the only structure DXR and Vulkan RT support,
  • a multi-level structure with more than one IAS above the bottom level GAS.
    This is useful if you need to instance whole sub-models, but it’s limited in depth.)

For each of these three cases, OptiX provides a matching traversableGraphFlags value.
In the above order these are:
OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS,
OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING,
OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_ANY.

For performance, it’s recommended to use the IAS->GAS render graph structure (traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING), even if you only have a single GAS.
Simply put an IAS with a single OptixInstance with an identity transform above that and the GAS traversable handle as child, then use the IAS traversable handle as argument inside the optixTrace calls.

That simple graph would not need any transforms from object space to world space inside the device code, because with the identity transform object space == world space, but this is the fastest render graph for RTX boards because BVH traversal through that is fully hardware accelerated and when using built-in triangle primitives, ray-triangle intersections as well.

When adding more GAS in the future, you have the proper render graph layout to add more OptixInstances to that top-level IAS easily.
(Example code where I show that inside the intro examples here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L1596
Later examples use a very simple host-side scene graph with arbitrary depth which is traversed to flatten it to an IAS->GAS render graph.
For the fastest example inside that repository, please look at rtigo12.

All my examples implement a pinhole camera (“lens shaders” are implemented as direct callable program)
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/shaders/raygeneration.cu#L336
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/shaders/lens_shader.cu#L40
and a runtime generated plane geometry with selectable tessellation:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/src/Plane.cpp#L37

Inside the intro examples these are build directly and assigned to an OptixInstance:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L1600
inside the later examples these are put into the host side scene graph and automatically instanced when the same geometry was built before:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/src/Application.cpp#L1842

(If you use code blocks for all posted code (the </> icon in the toolbar, preformatted text, ctrl+e) that will preserve the formatting.)

1 Like

Thank you for the clarification! So basically the flag doesn’t matter in this case as the scene is so small, but I would have to look into which one is most effective if I choose to expand it?

I’ve now managed to produce my first image of the scene (Wohoo!), however I think that I’ve messed up either the implemenation of the rayGen program and cannot figure out what. The image I get as output looks like the following, but I thought it was going to be the 2D-plane which I define at the beginning of the main application? Any ideas on what I should start looking into?

Output image:

Main app:
(I’ve used the preformatted text now)

#include <optix.h>
#include <cuda_runtime.h>
#include <optix_stubs.h>
#include <optix_function_table_definition.h>
#include <iostream>
#include <../inc/exception.h>
#include <optix_types.h>
#include <sutil/sutil.h>

#include <optix_stack_size.h>

#include <fstream>
#include <sstream>
#include <string>

#include <GLFW/glfw3.h>
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "../inc/stb_image_write.h"


// Initialize OptiX and create context
void initOptix(CUcontext& cuContext, CUstream& stream) {
    // Initialize CUDA
    CUdevice cuDevice;
    cuInit(0);
    cuDeviceGet(&cuDevice, 0); // Assumes device 0 is suitable
    cuCtxCreate(&cuContext, 0, cuDevice);

    // Create a CUDA stream
    cuStreamCreate(&stream, CU_STREAM_DEFAULT);

    // Initialize OptiX
    optixInit();
}

// Camera struct
struct Camera {
    float3 eye;         // Camera position
    float3 lookAt;      // Look-at point
    float3 up;          // Up vector
    float fov;          // Field of view in degrees
    float aspectRatio;  // Aspect ratio of the image
};


// Vertex structure for a simple 2D plane
struct Vertex {
    float x, y, z; // Position
};

 //Define vertices of a square plane centered at the origin
Vertex vertices[] = {
    {-0.5f, -0.5f, 0.0f}, // Bottom left
    {0.5f, -0.5f, 0.0f},  // Bottom right
    {0.5f, 0.5f, 0.0f},   // Top right
    {-0.5f, 0.5f, 0.0f}   // Top left
};


// Define indices for two triangles that make up the square
unsigned int indices[] = {
    0, 1, 2, // First triangle
    2, 3, 0  // Second triangle
};



// Load PTX files path
const char* ptxPathRayGen = "C:/Users/PC/Desktop/new_example_4/cuda/cuda_output/rayGen_v3.ptx";
const char* ptxPathhit = "C:/Users/PC/Desktop/new_example_4/cuda/cuda_output/hit_v3.ptx";
const char* ptxPathMiss = "C:/Users/PC/Desktop/new_example_4/cuda/cuda_output/miss_v3.ptx";




OptixTraversableHandle createAccelerationStructure(OptixDeviceContext context, CUdeviceptr& d_vertices, CUdeviceptr& d_indices, CUdeviceptr& d_tempBufferGas, CUdeviceptr& d_gasOutputBuffer) {
    // Assume 'vertices' and 'indices' are already defined as shown above

    size_t vertices_size = sizeof(vertices);
    size_t indices_size = sizeof(indices);

    // Allocate device memory for vertices and indices
    cuMemAlloc(&d_vertices, vertices_size);
    cuMemAlloc(&d_indices, indices_size);

    // Copy vertices and indices to device memory
    cuMemcpyHtoD(d_vertices, vertices, vertices_size);
    cuMemcpyHtoD(d_indices, indices, indices_size);

    // Define build input
    OptixBuildInput buildInput = {};
    buildInput.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;

    // Setup vertex buffer
    OptixBuildInputTriangleArray triangleArray = {};
    triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
    triangleArray.vertexStrideInBytes = sizeof(Vertex);
    triangleArray.numVertices = 4;
    triangleArray.vertexBuffers = &d_vertices;

    // Setup index buffer
    triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
    triangleArray.indexStrideInBytes = sizeof(unsigned int) * 3;
    triangleArray.numIndexTriplets = 2;
    triangleArray.indexBuffer = d_indices;

    // Set the flags for each SBT record
    unsigned int inputFlags[1] = { OPTIX_GEOMETRY_FLAG_NONE };
    triangleArray.flags = inputFlags; // Correctly assign flags to the triangleArray
    triangleArray.numSbtRecords = 1; // Correctly assign the number of SBT records

    // Now, correctly link the fully configured triangleArray to the buildInput
    buildInput.triangleArray = triangleArray;

    // Specify build options
    OptixAccelBuildOptions accelOptions = {};
    accelOptions.buildFlags = OPTIX_BUILD_FLAG_NONE;
    accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

    // Output buffers for acceleration structure
    OptixAccelBufferSizes gasBufferSizes;
    optixAccelComputeMemoryUsage(context, &accelOptions, &buildInput, 1, &gasBufferSizes);

    // Allocate memory for acceleration structure
    cuMemAlloc(&d_tempBufferGas, gasBufferSizes.tempSizeInBytes);
    cuMemAlloc(&d_gasOutputBuffer, gasBufferSizes.outputSizeInBytes);

    // Build acceleration structure
    OptixTraversableHandle gasHandle = 0;
    optixAccelBuild(context, 0, &accelOptions, &buildInput, 1,
        d_tempBufferGas, gasBufferSizes.tempSizeInBytes,
        d_gasOutputBuffer, gasBufferSizes.outputSizeInBytes,
        &gasHandle, nullptr, 0);

    // Return the handle to the created acceleration structure
    return gasHandle;

    // Do not forget to free the allocated buffers after they are no longer needed
}


// Function to load the contents of a PTX file into a string
std::string loadPtx(const std::string& filepath) {
    std::ifstream file(filepath.c_str());
    if (!file) {
        std::cerr << "Failed to open PTX file: " << filepath << std::endl;
        return "";
    }

    std::stringstream buffer;
    buffer << file.rdbuf();
    return buffer.str();
}


int main() {
    // Step 1: Setup OptiX context (for OptiX 7+, prepare to use CUDA directly)
    CUcontext cuContext;
    CUstream stream;
    OptixDeviceContext optixContext = nullptr;
    
    // Initialize OptiX and CUDA context
    initOptix(cuContext, stream);

    CUdeviceptr d_vertices = 0;
    CUdeviceptr d_indices = 0;
    CUdeviceptr d_tempBufferGas = 0;
    CUdeviceptr d_gasOutputBuffer = 0;

    // Create an OptiX device context using the CUDA context
    OPTIX_CHECK(optixDeviceContextCreate(cuContext, nullptr, &optixContext));
    OPTIX_CHECK(optixDeviceContextSetLogCallback(optixContext, [](unsigned int level, const char* tag, const char* message, void* cbdata) {
        std::cerr << "[" << tag << "] " << message << "\n";
        }, nullptr, 4)); // LogLevel=4 for verbosity

    // Step 2: Define geometry and create acceleration structures
    OptixTraversableHandle gasHandle = createAccelerationStructure(optixContext, d_vertices, d_indices, d_tempBufferGas, d_gasOutputBuffer);


    // Module and Program Group Compile Options
    OptixModuleCompileOptions moduleCompileOptions = {};
    moduleCompileOptions.maxRegisterCount = OPTIX_COMPILE_DEFAULT_MAX_REGISTER_COUNT;
    moduleCompileOptions.optLevel = OPTIX_COMPILE_OPTIMIZATION_DEFAULT;
    moduleCompileOptions.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_NONE;

    OptixPipelineCompileOptions pipelineCompileOptions = {};
    pipelineCompileOptions.usesMotionBlur = false;
    pipelineCompileOptions.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING;
    pipelineCompileOptions.numPayloadValues = 2; // Adjust based on your needs
    pipelineCompileOptions.numAttributeValues = 2; // Adjust based on your needs
    pipelineCompileOptions.exceptionFlags = OPTIX_EXCEPTION_FLAG_NONE; // or other flags as necessary
    pipelineCompileOptions.pipelineLaunchParamsVariableName = "params";

    // Step 3: Create shaders (ray generation, hit, miss)
    char log[2048]; // For logging
    size_t logSize = sizeof(log) - 1;
    log[0] = '\0'; // Initialize log with empty string


    // Load the PTX source code
    std::string rayGenPTX = loadPtx(ptxPathRayGen);
    OptixModule rayGenModule;
    OPTIX_CHECK(optixModuleCreate(
        optixContext,
        &moduleCompileOptions,
        &pipelineCompileOptions,
        rayGenPTX.c_str(), rayGenPTX.size(),
        log, &logSize,
        &rayGenModule
    ));

    std::string hitPTX = loadPtx(ptxPathhit);
    OptixModule hitModule;
    OPTIX_CHECK(optixModuleCreate(
        optixContext,
        &moduleCompileOptions,
        &pipelineCompileOptions,
        hitPTX.c_str(), hitPTX.size(),
        log, &logSize,
        &hitModule
    ));


    std::string missPTX = loadPtx(ptxPathMiss);
    OptixModule missModule;
    OPTIX_CHECK(optixModuleCreate(
        optixContext,
        &moduleCompileOptions,
        &pipelineCompileOptions,
        missPTX.c_str(), missPTX.size(),
        log, &logSize,
        &missModule
    ));


    // Define program group options - typically, this is left as default for basic usage
    OptixProgramGroupOptions programGroupOptions = {};


    // Create program group for ray generation shader
    OptixProgramGroupDesc raygenPGDesc = {};
    raygenPGDesc.kind = OPTIX_PROGRAM_GROUP_KIND_RAYGEN;
    raygenPGDesc.raygen.module = rayGenModule; // Ensure rayGenModule is correctly initialized
    raygenPGDesc.raygen.entryFunctionName = "__raygen__rg";

    OptixProgramGroup raygenProgramGroup;
    OPTIX_CHECK(optixProgramGroupCreate(
        optixContext,
        &raygenPGDesc,
        1, // Number of program group descriptions
        &programGroupOptions, // Corrected to use program group options
        log, &logSize, // Log buffer and its size
        &raygenProgramGroup // The created program group
    ));

    // Create program group for closest hit shader
    OptixProgramGroupDesc hitPGDesc = {};
    hitPGDesc.kind = OPTIX_PROGRAM_GROUP_KIND_HITGROUP;
    hitPGDesc.hitgroup.moduleCH = hitModule; // Use your hit module here
    hitPGDesc.hitgroup.entryFunctionNameCH = "__closesthit__ch"; // Entry point for your hit shader

    OptixProgramGroup hitProgramGroup;
    OPTIX_CHECK(optixProgramGroupCreate(
        optixContext,
        &hitPGDesc, // Use the hit program group descriptor
        1, // One program group description
        &programGroupOptions, // Assuming programGroupOptions is already defined
        log, &logSize, // Log buffer and size
        &hitProgramGroup // The created program group
    ));

    // Create program group for miss shader
    OptixProgramGroupDesc missPGDesc = {};
    missPGDesc.kind = OPTIX_PROGRAM_GROUP_KIND_MISS;
    missPGDesc.miss.module = missModule; // Ensure missModule is correctly initialized
    missPGDesc.miss.entryFunctionName = "__miss__ms";

    OptixProgramGroup missProgramGroup;
    OPTIX_CHECK(optixProgramGroupCreate(
        optixContext,
        &missPGDesc,
        1, // Number of program group descriptions
        &programGroupOptions, // Program group options
        log, &logSize, // Log buffer and its size
        &missProgramGroup // The created program group
    ));



    // Step 4: Setup ray tracing pipeline
    OptixPipeline pipeline;

    OptixPipelineLinkOptions pipelineLinkOptions = {};
    pipelineLinkOptions.maxTraceDepth = 2;
    OptixProgramGroup programGroups[] = { raygenProgramGroup, hitProgramGroup, missProgramGroup };

    OPTIX_CHECK(optixPipelineCreate(
        optixContext,
        &pipelineCompileOptions,
        &pipelineLinkOptions,
        programGroups,
        sizeof(programGroups) / sizeof(programGroups[0]), // Number of program groups
        log, &logSize,
        &pipeline
    ));



    //OptixResult optixResult = optixPipelineCreate(
    //    optixContext,
    //    &pipelineCompileOptions,
    //    &pipelineLinkOptions,
    //    programGroups,
    //    sizeof(programGroups) / sizeof(programGroups[0]), // Number of program groups
    //    log, &logSize,
    //    &pipeline
    //);

    //if (optixResult != OPTIX_SUCCESS) {
    //    // If the pipeline creation failed, log the error and exit
    //    std::cerr << "Optix call failed: " << log << std::endl;
    //    return -1; // or handle the error as appropriate
    //}


    //// Set stack sizes
    //uint32_t directCallableStackSizeFromTraversal = 64; // Minimal if not using direct callables
    //uint32_t directCallableStackSizeFromState = 64; // Minimal if not using direct callables
    //uint32_t continuationStackSize = 1024; // Estimate based on the complexity of your shaders

    //// The maximum depth of the traversable graph for ray tracing
    //uint32_t maxTraversableGraphDepth = 2; // Assuming a simple scene

    //OPTIX_CHECK(optixPipelineSetStackSize(
    //    pipeline,
    //    directCallableStackSizeFromTraversal,
    //    directCallableStackSizeFromState,
    //    continuationStackSize,
    //    maxTraversableGraphDepth // Maximum depth of traversal
    //));



    // STACK SIZES
    OptixStackSizes ssp = {}; // Whole pipeline.

    for (auto pg : programGroups)
    {
        OptixStackSizes ss;

#if (OPTIX_VERSION >= 70700)
        OPTIX_CHECK(optixProgramGroupGetStackSize(pg, &ss, pipeline));
#else
        OPTIX_CHECK(optixProgramGroupGetStackSize(pg, &ss));
#endif

        ssp.cssRG = std::max(ssp.cssRG, ss.cssRG);
        ssp.cssMS = std::max(ssp.cssMS, ss.cssMS);
        ssp.cssCH = std::max(ssp.cssCH, ss.cssCH);
        ssp.cssAH = std::max(ssp.cssAH, ss.cssAH);
        ssp.cssIS = std::max(ssp.cssIS, ss.cssIS);
        ssp.cssCC = std::max(ssp.cssCC, ss.cssCC);
        ssp.dssDC = std::max(ssp.dssDC, ss.dssDC);
    }

    // Temporaries
    unsigned int cssCCTree = ssp.cssCC; // Should be 0. No continuation callables in this pipeline. // maxCCDepth == 0
    unsigned int cssCHOrMSPlusCCTree = std::max(ssp.cssCH, ssp.cssMS) + cssCCTree;

    const unsigned int maxDCDepth = 2; // The __direct_callable__light_mesh calls other direct callables from MDL expressions.

    // Arguments

    unsigned int directCallableStackSizeFromTraversal = ssp.dssDC * maxDCDepth; // FromTraversal: DC is invoked from IS or AH.      // Possible stack size optimizations.
    unsigned int directCallableStackSizeFromState = ssp.dssDC * maxDCDepth; // FromState:     DC is invoked from RG, MS, or CH. // Possible stack size optimizations.
    unsigned int continuationStackSize = ssp.cssRG + cssCCTree + cssCHOrMSPlusCCTree * (std::max(1u, pipelineLinkOptions.maxTraceDepth) - 1u) +
            std::min(1u, pipelineLinkOptions.maxTraceDepth) * std::max(cssCHOrMSPlusCCTree, ssp.cssAH + ssp.cssIS);
    unsigned int maxTraversableGraphDepth = 2;

    OPTIX_CHECK(optixPipelineSetStackSize(pipeline, directCallableStackSizeFromTraversal, directCallableStackSizeFromState, continuationStackSize, maxTraversableGraphDepth));


    // Step 5: Render the scene

    struct __align__(OPTIX_SBT_RECORD_ALIGNMENT) RayGenSbtRecord {
        __align__(OPTIX_SBT_RECORD_HEADER_SIZE) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
        // Add ray generation data here
        //OptixProgramGroup raygenProgram;
    };

    struct __align__(OPTIX_SBT_RECORD_ALIGNMENT) MissSbtRecord {
        __align__(OPTIX_SBT_RECORD_HEADER_SIZE) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
        // Add miss data here
        //OptixProgramGroup missProgram;
    };

    struct __align__(OPTIX_SBT_RECORD_ALIGNMENT) HitGroupSbtRecord {
        __align__(OPTIX_SBT_RECORD_HEADER_SIZE) char header[OPTIX_SBT_RECORD_HEADER_SIZE];
        // Add hit group data here
        //OptixProgramGroup closestHitProgram;
    };

    RayGenSbtRecord rgSbt;
    MissSbtRecord msSbt;
    HitGroupSbtRecord hgSbt;

    optixSbtRecordPackHeader(raygenProgramGroup, &rgSbt);
    optixSbtRecordPackHeader(missProgramGroup, &msSbt);
    optixSbtRecordPackHeader(hitProgramGroup, &hgSbt);

    // Allocate and copy SBT records to device memory
    CUdeviceptr d_raygenRecords, d_missRecords, d_hitgroupRecords;
    size_t sbtSize = sizeof(RayGenSbtRecord); // Same for Miss and HitGroup if they don't have additional data
    cudaMalloc(reinterpret_cast<void**>(&d_raygenRecords), sbtSize);
    cudaMalloc(reinterpret_cast<void**>(&d_missRecords), sbtSize);
    cudaMalloc(reinterpret_cast<void**>(&d_hitgroupRecords), sbtSize);

    cudaMemcpy(reinterpret_cast<void*>(d_raygenRecords), &rgSbt, sbtSize, cudaMemcpyHostToDevice);
    cudaMemcpy(reinterpret_cast<void*>(d_missRecords), &msSbt, sbtSize, cudaMemcpyHostToDevice);
    cudaMemcpy(reinterpret_cast<void*>(d_hitgroupRecords), &hgSbt, sbtSize, cudaMemcpyHostToDevice);

    OptixShaderBindingTable sbt = {};
    sbt.raygenRecord = d_raygenRecords;
    sbt.missRecordBase = d_missRecords;
    sbt.missRecordStrideInBytes = sizeof(MissSbtRecord);
    sbt.missRecordCount = 1;
    sbt.hitgroupRecordBase = d_hitgroupRecords;
    sbt.hitgroupRecordStrideInBytes = sizeof(HitGroupSbtRecord);
    sbt.hitgroupRecordCount = 1;

    struct LaunchParams {
        int width;
        int height;
        CUdeviceptr outputBuffer;
        Camera camera;
        // Add other parameters as needed
    };

    LaunchParams params = { 800, 600 }; // Example dimensions

    //Camera params
    params.camera.eye = make_float3(1.0f, 0.0f, 0.5f);
    params.camera.lookAt = make_float3(0.0f, 0.0f, 0.0f);
    params.camera.up = make_float3(0.0f, 1.0f, 0.0f);
    params.camera.fov = 45.0f;
    params.camera.aspectRatio = static_cast<float>(params.width) / static_cast<float>(params.height);

    CUdeviceptr d_params;
    cudaMalloc(reinterpret_cast<void**>(&d_params), sizeof(LaunchParams));
    cudaMemcpy(reinterpret_cast<void*>(d_params), &params, sizeof(LaunchParams), cudaMemcpyHostToDevice);

    CUdeviceptr d_outputBuffer;
    cudaMalloc(reinterpret_cast<void**>(&d_outputBuffer), params.width* params.height * sizeof(float4)); // Assuming float4 per pixel
    params.outputBuffer = d_outputBuffer;
    // Update params on the device
    cudaMemcpy(reinterpret_cast<void*>(d_params), &params, sizeof(LaunchParams), cudaMemcpyHostToDevice);

    optixLaunch(
        pipeline,
        stream,
        d_params,
        sizeof(LaunchParams),
        &sbt,
        params.width,  // launch width
        params.height, // launch height
        1              // launch depth
    );
    cudaStreamSynchronize(stream); // Wait for completion

    float4* h_outputBuffer = new float4[params.width * params.height];
    cudaMemcpy(h_outputBuffer, reinterpret_cast<void*>(d_outputBuffer), params.width* params.height * sizeof(float4), cudaMemcpyDeviceToHost);

     //Now, `h_outputBuffer` contains your rendered image. You can save it to a file or display it.
     
     //This saves a png file
     //Convert color values from [0.0, 1.0] to [0, 255]
    for (int i = 0; i < params.width * params.height; ++i) {
        h_outputBuffer[i].x = h_outputBuffer[i].x * 255.99f;
        h_outputBuffer[i].y = h_outputBuffer[i].y * 255.99f;
        h_outputBuffer[i].z = h_outputBuffer[i].z * 255.99f;
        h_outputBuffer[i].w = 255.99f; // Full opacity
    }

    // Save the image to a file
    const char* outputFile = "output.png";
    stbi_write_png(outputFile, params.width, params.height, 4, h_outputBuffer, params.width * 4);

    std::cout << "Saved rendered image to " << outputFile << std::endl;


    // Step 6: Cleanup and free resources
    cudaFree(reinterpret_cast<void*>(d_raygenRecords));
    cudaFree(reinterpret_cast<void*>(d_missRecords));
    cudaFree(reinterpret_cast<void*>(d_hitgroupRecords));
    cudaFree(reinterpret_cast<void*>(d_outputBuffer));
    cudaFree(reinterpret_cast<void*>(d_params));


    OPTIX_CHECK(optixProgramGroupDestroy(raygenProgramGroup));
    OPTIX_CHECK(optixProgramGroupDestroy(hitProgramGroup));
    OPTIX_CHECK(optixProgramGroupDestroy(missProgramGroup));
    OPTIX_CHECK(optixModuleDestroy(rayGenModule));
    OPTIX_CHECK(optixModuleDestroy(hitModule));
    OPTIX_CHECK(optixModuleDestroy(missModule));


    cuMemFree(d_vertices);
    cuMemFree(d_indices);
    cuMemFree(d_tempBufferGas);
    cuMemFree(d_gasOutputBuffer);

    OPTIX_CHECK(optixDeviceContextDestroy(optixContext));
    cuStreamDestroy(stream);
    cuCtxDestroy(cuContext);

    return 0;
}


rayGen program:

#include <optix.h>
#include "../sutil/vec_math.h"
#include <cuda_runtime.h>

// Camera struct
struct Camera {
    float3 eye;         // Camera position
    float3 lookAt;      // Look-at point
    float3 up;          // Up vector
    float fov;          // Field of view in degrees
    float aspectRatio;  // Aspect ratio of the image
};

struct LaunchParams {
    int width;
    int height;
    CUdeviceptr outputBuffer; // RGBA format
    // Add Camera struct definition here as well, or include a header file containing it
    Camera camera;
};

extern "C" {
    __constant__ LaunchParams params;
}

extern "C" __global__ void __raygen__rg() {
    const uint3 idx = optixGetLaunchIndex();
    const float u = (static_cast<float>(idx.x) / static_cast<float>(params.width) - 0.5f) * 2.0f;
    const float v = (static_cast<float>(idx.y) / static_cast<float>(params.height) - 0.5f) * 2.0f;

    float3 direction = normalize(params.camera.lookAt - params.camera.eye);
    float3 right = normalize(cross(direction, params.camera.up));
    float3 up = cross(right, direction);

    float fovScale = tanf(params.camera.fov * 0.5f * M_PIf / 180.0f);
    float aspectRatio = params.camera.aspectRatio;

    float3 rayDirection = normalize(direction + fovScale * (u * aspectRatio * right + v * up));

    // Here you would continue with setting up and tracing rays using rayDirection
    // For now, we'll just output a color to indicate it's working:
    float4* outputBuffer = reinterpret_cast<float4*>(params.outputBuffer);
    outputBuffer[idx.y * params.width + idx.x] = make_float4(rayDirection.x, rayDirection.y, rayDirection.z, 1.0f);
}

miss program:

#include <optix.h>

struct Camera {
    float3 eye;         // Camera position
    float3 lookAt;      // Look-at point
    float3 up;          // Up vector
    float fov;          // Field of view in degrees
    float aspectRatio;  // Aspect ratio of the image
};

struct LaunchParams {
    int width;
    int height;
    CUdeviceptr outputBuffer; // Add additional parameters as needed.
    Camera camera;
};

extern "C" {
__constant__ LaunchParams params; // This assumes you have a similar struct in your host code.
}

__constant__ float3 bgColor = {0.1f, 0.1f, 0.1f}; // Background color

extern "C" __global__ void __miss__ms() {
    float3* prd = reinterpret_cast<float3*>(optixGetPayload_0());
    *prd = bgColor;
}

hit program:

#include <optix.h>

struct Camera {
    float3 eye;         // Camera position
    float3 lookAt;      // Look-at point
    float3 up;          // Up vector
    float fov;          // Field of view in degrees
    float aspectRatio;  // Aspect ratio of the image
};

struct LaunchParams {
    int width;
    int height;
    CUdeviceptr outputBuffer; // Additional parameters can be added as needed.
    Camera camera;
};

extern "C" {
__constant__ LaunchParams params; // Even if not used directly, it needs to be declared to fix the error.
}

extern "C" __global__ void __closesthit__ch() {
    float3* prd = reinterpret_cast<float3*>(optixGetPayload_0());
    *prd = make_float3(1.0f, 0.0f, 0.0f); // Example: Set hit color to red.
}

So basically the flag doesn’t matter in this case as the scene is so small, but I would have to look into which one is most effective if I choose to expand it?

It does matter a lot! You must set it matching to the render graph you’re using!
You’re only calling optixAccelBuild once for a GAS, so you should set it to OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS as explained.
Change it when you’re using different render graph layouts in the future.

Also make sure you do not blindly copy code from other examples:
const unsigned int maxDCDepth = 2; // The __direct_callable__light_mesh calls other direct callables from MDL expressions.
I doubt that you’re using my light sampling mechanisms from the MDL_renderer in your code.
This should be zero when you are not using direct callable programs at all.

Mind that the logSize inside the OptiX API functions are input-output values. When using them, you must reset the logSize to the log buffer size before every call. That’s why I am not using them at all (just nullptr) but just set the OptixDeviceContextOptions callback instead.

This can’t work:
float3* prd = reinterpret_cast<float3*>(optixGetPayload_0());
You cannot encode a 64-bit pointer to your “per ray data” into a single 32-bit unsigned integer. That needs more than one payload register. Look for packPointer() and unpackPointer() functions inside the OptiX SDK.
Instead of using a pointer there, it would be faster to use three payload registers and write the RGB colors to them with

optixSetPayload_0(__float_as_uint(r));
optixSetPayload_1(__float_as_uint(g));
optixSetPayload_2(__float_as_uint(b));

and then read them back inside the caller with:

float r = __uint_as_float(optixGetPayload_0());
float g = __uint_as_float(optixGetPayload_1());
float b = __uint_as_float(optixGetPayload_2());

That needs adjustments to the number OptiX payload registers inside the pipeline numPayloadValues.

Finally, could it be that the stbi_write_png() function doesn’t handle float4 data but expects uchar4?
In that case you would need to change your conversion of the h_outputBuffer contents slightly to write into another unsigned byte per channel host image and save that. Look at sutil.cpp line 626.

It is now working with your help! Thank you so much for your support and patience, even though I am new to this and ask sort of stupid questions…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.