Take full advantage of CUDA core and RT core

novice · February 5, 2023, 2:49am

optixLaunch is called in host code to invoke a 1D, 2D or 3D array of threads on the device and invokes ray generation programs for each thread. Are these threads supported by CUDA core? If so, how many threads can be started in parallel to make full use of the cuda core?
When the ray generation program invokes optixTrace, other programs are invoked to execute traversal. How many rays can traverse in parallel to make full use of the cuda core (for custom primitive) and RT core?

droettger · February 6, 2023, 7:10am

optixLaunch is called in host code to invoke a 1D, 2D or 3D array of threads on the device and invokes ray generation programs for each thread. Are these threads supported by CUDA core?

Yes, OptiX is using CUDA.

https://raytracing-docs.nvidia.com/optix7/guide/index.html#introduction#overview

If you read through the OptiX 7 Programming Guide and example source code, you’ll see that all resource management inside your host application is done with native CUDA host calls, either using the CUDA Runtime API or the CUDA Driver API.
OptiX is using a single ray programming model and does all scheduling internally, so some of the native CUDA features like shared memory access and warp synchronization instructions are not allowed.

Then all device code you implement for the different program domains (raygen, exception, closest hit, any hit, intersection, miss, direct or continuation callables) is written in CUDA C++ device code.

All these device programs run on the Streaming Multiprocessors (SM) of the GPU.
On RTX GPUs there are also RT cores which handle traversal through the acceleration structures and the ray-triangle intersection calculation in hardware.
Then there are Tensor cores which are used for the OptiX Denoiser.

If so, how many threads can be started in parallel to make full use of the cuda core?

The OptiX launch dimension has a limit of 2^30.

https://raytracing-docs.nvidia.com/optix7/guide/index.html#limits#limits

When the ray generation program invokes optixTrace , other programs are invoked to execute traversal.
How many rays can traverse in parallel to make full use of the cuda core (for custom primitive) and RT core?

That is completely abstracted by OptiX. It will launch as many threads from the given launch dimension in parallel as there are resources available on the underlying GPU.

Note that it depends on the underlying GPU how many rays are required to saturate a modern GPU architecture.
Meaning there is a minimum number threads required to make best use of the GPU hardware which is directly related to how many CUDA cores a specific GPU has, and how many registers are used inside a device program. Then there are other factors like the memory bandwidth, cache sizes, etc. It’s complicated.

You’ll find more information about this inside the CUDA Programming Guide:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Please use the search field in the top right of this forum to find more detailed information when you’re still having questions after reading the OptiX Programming Guide.
For example this thread:
https://forums.developer.nvidia.com/t/a-few-questions-about-bvh-traversal-engine-and-triangle-intersection-engine/233412

Topic		Replies	Views
How many rays can be processed in parallel OptiX	1	608	August 14, 2023
How to utilize CUDA, Tensor, and RT cores in one program CUDA Programming and Performance	5	2431	September 17, 2024
Access multiple BVH parallel OptiX	3	555	July 18, 2023
Launch dimensions in LaunchContextnD and optixLaunch OptiX	5	1611	October 12, 2021
Newbie OptiX question(s) OptiX	11	1254	June 14, 2022
optixTriangle: how to shoot rays to specific set of co-ordinates? OptiX	10	440	June 20, 2024
How CUDA Warp(s) relate to OptiX 7 Ray(s) OptiX	3	843	June 14, 2022
Optix7.0: Could I use two streams for two optixLaunch operation in two threads for speed-optimize? OptiX	2	1039	June 14, 2022
Maximizing GPU Utilization Raytracing	2	1292	July 17, 2023
Performance difference between using optix and cuda for non raytracing OptiX	3	4189	December 13, 2021

Take full advantage of CUDA core and RT core

Related topics