How many rays can be processed in parallel

rgangar · August 11, 2023, 3:54pm

I was wondering if I launched 100 million rays, I would assume all 100 million rays wouldn’t be traversed in parallel. There has to a limit which would fill up the SMs and RT Cores such that they would be scheduled to run after rays finish. Now how would I go about finding that number for my hardware setup.

droettger · August 14, 2023, 8:49am

Let’s differentiate the launch dimensions from numbers of rays.

The optixLaunch dimension arguments width, height, depth define how many threads are started.
In OptiX, this number is limited to 2^30, see the Limits chapter inside the OptiX Programming Guide: https://raytracing-docs.nvidia.com/optix8/guide/index.html#limits#limits

Depending on the raytracing algorithm implemented inside your device programs, each of the launch indices (threads) can call optixTrace multiple times, so the number of rays is usually not equivalent to the launch dimension. That’s only the case if you have exactly one optixTrace call inside the ray generation program without a loop and nowhere else.

Now how that number of threads is mapped to the actually available resources of the underlying GPU depends on the resources used inside your raytracing kernels and the scheduler implementation inside OptiX.

Since OptiX is implemented in CUDA, you might want to have a look into the CUDA Programming Manual how that is grouping threads into warps, blocks, and grids.

How many individual cores are available on your GPU can be found inside the GPU specifications.
There are some Wikipedia sites which summarize that, e.g. like here: https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture)

When analyzing your OptiX device code with Nsight Compute, you will be able to see the resource usage of your kernels, the number of blocks, or the warp occupancy of your own device functions. How the individual rays are scheduled onto the RT cores isn’t going to be exposed though.

If you’re profiling your application for performance, start with Nsight Systems to find bottlenecks due to synchronizations or memory transfers first.

Topic		Replies	Views
Take full advantage of CUDA core and RT core OptiX	1	2270	February 6, 2023
Launch dimensions in LaunchContextnD and optixLaunch OptiX	5	1609	October 12, 2021
Multiple ray generation points (camera) OptiX	4	752	June 14, 2022
Allowing multiple threads to process a single pixel. OptiX	5	1162	June 14, 2022
Newbie OptiX question(s) OptiX	11	1253	June 14, 2022
Total number of threads corresponding to launch configuration OptiX	2	553	June 14, 2022
How to configure launch grid configuration for computational task? OptiX	2	223	June 20, 2024
Optix-low computational usage on GPU OptiX	12	943	June 22, 2022
Compute rays/sec for Optix Program OptiX	2	61	November 26, 2024
Are these reasonable numbers? RTX 3060, Optix 7, 128 billion rays in ~35 seconds OptiX	5	764	October 12, 2021

How many rays can be processed in parallel

Related topics