Two GAS with no IAS

I’m writing an optix program that doesn’t render. All I have to do is check if ray hits. I made two objects into GAS and traced them according to the gas handle in the kernel as shown below.

// build GAS1
uint32_t triangleInputFlags[1] = { };
CUdeviceptr vertexBufferPointer;

if (n_triangle == 0) return;
OptixBuildInput triangleInput = {};
vertexBufferPointer = reinterpret_cast<CUdeviceptr>(vertexData1);
triangleInput.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
triangleInput.triangleArray.vertexBuffers = &vertexBufferPointer;
triangleInput.triangleArray.numVertices = static_cast<uint32_t>(n_triangleSize * 3)
triangleInput.triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
triangleInput.triangleArray.vertexStrideInBytes = sizeof(float3);
triangleInput.triangleArray.flags = triangleInputFlags;
triangleInput.triangleArray.numSbtRecords = 1;

OptixAccelBuildOptions accelOptions = {};

accelOptions.buildFlags = OPTIX_BUILD_FLAG_NONE | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE;
accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

OPTIX_CHECK(optixAccelComputeMemoryUsage(
	parameterContext,
	&accelOptions,
	&triangleInput,
	1,
	&m_gasBufferSizes
));
m_tempBuffer.allocIfRequired(m_gasBufferSizes.tempSizeInBytes, false, OptixStream);
m_outputBuffer.allocIfRequired(m_gasBufferSizes.outputSizeInBytes, false, OptixStream);
	
OPTIX_CHECK(optixAccelBuild(
	parameterContext,
	0,
	&accelOptions,
	&triangleInput,
	1,
	m_tempBuffer.get(),
	m_gasBufferSizes.tempSizeInBytes,
	m_outputBuffer.get(),
	m_gasBufferSizes.outputSizeInBytes,
	&gas_handle1,
	nullptr,
	0
));

,

// build GAS2
uint32_t triangleInputFlags2[1] = { };
CUdeviceptr vertexBufferPointer2;

if (n_triangle == 0) return;
OptixBuildInput triangleInput2 = {};
vertexBufferPointer = reinterpret_cast<CUdeviceptr>(vertexData2);
triangleInput2.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
triangleInput2.triangleArray.vertexBuffers = &vertexBufferPointer2;
triangleInput2.triangleArray.numVertices = static_cast<uint32_t>(n_triangleSize * 3)
triangleInput2.triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
triangleInput2.triangleArray.vertexStrideInBytes = sizeof(float3);
triangleInput2.triangleArray.flags = triangleInputFlags2;
triangleInput2.triangleArray.numSbtRecords = 1;

OptixAccelBuildOptions accelOptions2 = {};

accelOptions2.buildFlags = OPTIX_BUILD_FLAG_NONE | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE;
accelOptions2.operation = OPTIX_BUILD_OPERATION_BUILD;

OPTIX_CHECK(optixAccelComputeMemoryUsage(
	parameterContext,
	&accelOptions2,
	&triangleInput2,
	1,
	&m_gasBufferSizes2
));
m_tempBuffer2.allocIfRequired(m_gasBufferSizes2.tempSizeInBytes, false, OptixStream);
m_outputBuffer2.allocIfRequired(m_gasBufferSize2s.outputSizeInBytes, false, OptixStream);
	
OPTIX_CHECK(optixAccelBuild(
	parameterContext,
	0,
	&accelOptions2,
	&triangleInput2,
	1,
	m_tempBuffer2.get(),
	m_gasBufferSizes2.tempSizeInBytes,
	m_outputBuffer2.get(),
	m_gasBufferSizes2.outputSizeInBytes,
	&gas_handle2,
	nullptr,
	0
));


set sbt and set handles to params and so on

In kernel,

optixTrace(
gas_handle1, //(gas_handle2)
ray_origin, ray_direction,
tmin,
tmax,
0.0f, // rayTime
OptixVisibilityMask(1),
OPTIX_RAY_FLAG_NONE,
0,
1,
0,
u0);

i only just traced and do nothing with hit result (CH,AH,MISS)
And I measured the time after launch.

GAS1 is composed of 10 million triangles, GAS2 is composed of 1,000 triangles, and ray(launch width) is 10 million. However, even if trace with gas2 handle, the time was the same as trace with gas1 handle.

So I made only one GAS with 1,000 triangles and launched. In this case, I got very smaller time compare with upper case.

I think something wrong and gas2 handle maybe also traverse GAS1.
Is it because i didn’t use IAS? Of course, they synchronized after launch.
I want to know the reason …

Thankyou very much for reading this

How exactly did you measure that?

Note that all OptiX API calls which take a CUDA stream argument, like optixAccelBuild and optixLaunch, are asynchronous!
Means these calls will only launch a CUDA kernel and immediately return to the host caller.
https://raytracing-docs.nvidia.com/optix7/guide/index.html#implementation_principles#asynchronous-execution

If you want to measure the time the actual CUDA kernel took, you must synchronize the host with the CUDA kernel. That’s either done by waiting for CUDA events pushed into the stream before or easier by adding a CUDA stream synchronization (cudaStreamSynchronize(cudaStream) or cuStreamSynchronize(cudaStream) after the asynchronous call and only then measure the time.

To prevent measuring kernels potentially still running from the optixAccelBuild when already calling optixLaunch, you should add synchronizations after both or around the optixLaunch.
That could explain that your timings changed with the size of the acceleration structure.

Something like this:

cudaStreamSynchronize(cudaStream);
double beginTime = getTime();
optixLaunch(..., cudaStream, ...);
cudaStreamSynchronize(cudaStream);
double endTime = getTime();
double timeKernel = endTime - beginTime;

Thank you very much for your quick reply. I think I lacked explanation. I measured the time like this.

cudaStreamSynchronize(stream);
cudaEventCreate(&start);
cudaEventCreate(&stop);

cudaEventRecord(start);
OPTIX_CHECK(optixLaunch(
m_optixResource.m_pipeline,
0,
reinterpret_cast(d_params),
sizeof(RenderParams),
&m_sbt,
m_triangleSize,
1,
1
));
cudaStreamSynchronize(stream);
cudaEventRecord(stop);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime, start, stop);

I used only one stream (0). So I think i checked after all threads are done.
And I got about 30ms both with gas handle1 ( 10 million triangles ), 30ms with gas handle2 ( 1,000 triangles)
What wrong with my program?

And I wonder Could i use 2 GASes without using IAS like my case

Thank you very much for your help

What’s your system configuration?
OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

What are the CH, AH, MISS programs doing?
Maybe 333 MRays/second is an expected result for your system configuration and workload.

Could it be that the time is simply limited by the number of rays you shoot?
What happens if you vary the launch dimension?

If that is not changing the timing, have you tried using a different CUDA stream than the default?
It might have different synchronization behavior.
(I’m never using the default 0 CUDA stream in my OptiX 7 applications.)

Though careful with that, CUDA documentation in cudaEventElapsedTime says:
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

Do you have OptiX validation mode enabled?
https://raytracing-docs.nvidia.com/optix7/guide/index.html#context#validation-mode
That will add synchonizations. Never benchmark with validation mode enabled!

When using a GAS traversable handle inside the optixTrace call, that is your scene’s root node.
The BVH traversal wouldn’t know anything about the other GAS.
This requires OptixTraversableGraphFlags OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS
https://raytracing-docs.nvidia.com/optix7/api/group__optix__types.html#gabd8bb7368518a44361e045fe5ad1fd17

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.