Two GAS with no IAS

user44133 · March 14, 2023, 7:24am

I’m writing an optix program that doesn’t render. All I have to do is check if ray hits. I made two objects into GAS and traced them according to the gas handle in the kernel as shown below.

// build GAS1
uint32_t triangleInputFlags[1] = { };
CUdeviceptr vertexBufferPointer;

if (n_triangle == 0) return;
OptixBuildInput triangleInput = {};
vertexBufferPointer = reinterpret_cast<CUdeviceptr>(vertexData1);
triangleInput.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
triangleInput.triangleArray.vertexBuffers = &vertexBufferPointer;
triangleInput.triangleArray.numVertices = static_cast<uint32_t>(n_triangleSize * 3)
triangleInput.triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
triangleInput.triangleArray.vertexStrideInBytes = sizeof(float3);
triangleInput.triangleArray.flags = triangleInputFlags;
triangleInput.triangleArray.numSbtRecords = 1;

OptixAccelBuildOptions accelOptions = {};

accelOptions.buildFlags = OPTIX_BUILD_FLAG_NONE | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE;
accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

OPTIX_CHECK(optixAccelComputeMemoryUsage(
	parameterContext,
	&accelOptions,
	&triangleInput,
	1,
	&m_gasBufferSizes
));
m_tempBuffer.allocIfRequired(m_gasBufferSizes.tempSizeInBytes, false, OptixStream);
m_outputBuffer.allocIfRequired(m_gasBufferSizes.outputSizeInBytes, false, OptixStream);
	
OPTIX_CHECK(optixAccelBuild(
	parameterContext,
	0,
	&accelOptions,
	&triangleInput,
	1,
	m_tempBuffer.get(),
	m_gasBufferSizes.tempSizeInBytes,
	m_outputBuffer.get(),
	m_gasBufferSizes.outputSizeInBytes,
	&gas_handle1,
	nullptr,
	0
));

,

// build GAS2
uint32_t triangleInputFlags2[1] = { };
CUdeviceptr vertexBufferPointer2;

if (n_triangle == 0) return;
OptixBuildInput triangleInput2 = {};
vertexBufferPointer = reinterpret_cast<CUdeviceptr>(vertexData2);
triangleInput2.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;
triangleInput2.triangleArray.vertexBuffers = &vertexBufferPointer2;
triangleInput2.triangleArray.numVertices = static_cast<uint32_t>(n_triangleSize * 3)
triangleInput2.triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
triangleInput2.triangleArray.vertexStrideInBytes = sizeof(float3);
triangleInput2.triangleArray.flags = triangleInputFlags2;
triangleInput2.triangleArray.numSbtRecords = 1;

OptixAccelBuildOptions accelOptions2 = {};

accelOptions2.buildFlags = OPTIX_BUILD_FLAG_NONE | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE;
accelOptions2.operation = OPTIX_BUILD_OPERATION_BUILD;

OPTIX_CHECK(optixAccelComputeMemoryUsage(
	parameterContext,
	&accelOptions2,
	&triangleInput2,
	1,
	&m_gasBufferSizes2
));
m_tempBuffer2.allocIfRequired(m_gasBufferSizes2.tempSizeInBytes, false, OptixStream);
m_outputBuffer2.allocIfRequired(m_gasBufferSize2s.outputSizeInBytes, false, OptixStream);
	
OPTIX_CHECK(optixAccelBuild(
	parameterContext,
	0,
	&accelOptions2,
	&triangleInput2,
	1,
	m_tempBuffer2.get(),
	m_gasBufferSizes2.tempSizeInBytes,
	m_outputBuffer2.get(),
	m_gasBufferSizes2.outputSizeInBytes,
	&gas_handle2,
	nullptr,
	0
));

…
set sbt and set handles to params and so on
…

In kernel,

optixTrace(
gas_handle1, //(gas_handle2)
ray_origin, ray_direction,
tmin,
tmax,
0.0f, // rayTime
OptixVisibilityMask(1),
OPTIX_RAY_FLAG_NONE,
0,
1,
0,
u0);

i only just traced and do nothing with hit result (CH,AH,MISS)
And I measured the time after launch.

GAS1 is composed of 10 million triangles, GAS2 is composed of 1,000 triangles, and ray(launch width) is 10 million. However, even if trace with gas2 handle, the time was the same as trace with gas1 handle.

So I made only one GAS with 1,000 triangles and launched. In this case, I got very smaller time compare with upper case.

I think something wrong and gas2 handle maybe also traverse GAS1.
Is it because i didn’t use IAS? Of course, they synchronized after launch.
I want to know the reason …

Thankyou very much for reading this

droettger · March 14, 2023, 7:45am

How exactly did you measure that?

Note that all OptiX API calls which take a CUDA stream argument, like optixAccelBuild and optixLaunch, are asynchronous!
Means these calls will only launch a CUDA kernel and immediately return to the host caller.
https://raytracing-docs.nvidia.com/optix7/guide/index.html#implementation_principles#asynchronous-execution

If you want to measure the time the actual CUDA kernel took, you must synchronize the host with the CUDA kernel. That’s either done by waiting for CUDA events pushed into the stream before or easier by adding a CUDA stream synchronization (cudaStreamSynchronize(cudaStream) or cuStreamSynchronize(cudaStream) after the asynchronous call and only then measure the time.

To prevent measuring kernels potentially still running from the optixAccelBuild when already calling optixLaunch, you should add synchronizations after both or around the optixLaunch.
That could explain that your timings changed with the size of the acceleration structure.

Something like this:

cudaStreamSynchronize(cudaStream);
double beginTime = getTime();
optixLaunch(..., cudaStream, ...);
cudaStreamSynchronize(cudaStream);
double endTime = getTime();
double timeKernel = endTime - beginTime;

user44133 · March 14, 2023, 7:58am

Thank you very much for your quick reply. I think I lacked explanation. I measured the time like this.

cudaStreamSynchronize(stream);
cudaEventCreate(&start);
cudaEventCreate(&stop);

cudaEventRecord(start);
OPTIX_CHECK(optixLaunch(
m_optixResource.m_pipeline,
0,
reinterpret_cast(d_params),
sizeof(RenderParams),
&m_sbt,
m_triangleSize,
1,
1
));
cudaStreamSynchronize(stream);
cudaEventRecord(stop);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime, start, stop);

I used only one stream (0). So I think i checked after all threads are done.
And I got about 30ms both with gas handle1 ( 10 million triangles ), 30ms with gas handle2 ( 1,000 triangles)
What wrong with my program?

And I wonder Could i use 2 GASes without using IAS like my case

Thank you very much for your help

droettger · March 14, 2023, 8:21am

What’s your system configuration?
OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

What are the CH, AH, MISS programs doing?
Maybe 333 MRays/second is an expected result for your system configuration and workload.

Could it be that the time is simply limited by the number of rays you shoot?
What happens if you vary the launch dimension?

If that is not changing the timing, have you tried using a different CUDA stream than the default?
It might have different synchronization behavior.
(I’m never using the default 0 CUDA stream in my OptiX 7 applications.)

Though careful with that, CUDA documentation in cudaEventElapsedTime says:
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

Do you have OptiX validation mode enabled?
https://raytracing-docs.nvidia.com/optix7/guide/index.html#context#validation-mode
That will add synchonizations. Never benchmark with validation mode enabled!

When using a GAS traversable handle inside the optixTrace call, that is your scene’s root node.
The BVH traversal wouldn’t know anything about the other GAS.
This requires OptixTraversableGraphFlags OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS
https://raytracing-docs.nvidia.com/optix7/api/group__optix__types.html#gabd8bb7368518a44361e045fe5ad1fd17

system · March 28, 2023, 8:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
2 GASes insert into 1 IAS problem OptiX optix	13	1528	March 6, 2023
OptixTrace to 2 different GAS consecutively OptiX	5	187	November 27, 2024
Is it possible to call optixTrace from custom intersection? OptiX	15	1346	June 14, 2022
Single GAS traversable not working on Quadro RTX cards OptiX	7	1073	October 12, 2021
Split Triangle Mesh into Multiple GAS OptiX	5	561	May 13, 2024
How do I avoiding hitting the same triangle when calling tracing another ray? OptiX	14	807	February 7, 2024
Tracing against 2 types of geometry in 1 pipeline OptiX	16	997	November 30, 2022
Instance Acceleration Structure-OptiX 7.1 OptiX	3	1230	October 12, 2021
How long optix takes to rasterize large number triangles? OptiX cuda , ray-tracing , optix	12	992	November 6, 2023
Question about Instance Acceleartion Struction OptiX	4	677	February 26, 2024

Two GAS with no IAS

Related topics